# https://github.com/oxylabs/oxylabs-ai-studio-py 项目说明书

生成时间：2026-05-18 03:01:39 UTC

## 目录

- [Installation Guide](#installation)
- [Quick Start Guide](#quickstart)
- [AI-Scraper Feature](#ai-scraper)
- [AI-Crawler Feature](#ai-crawler)
- [AI-Search Feature](#ai-search)
- [AI-Map Feature](#ai-map)
- [Browser Agent Feature](#browser-agent)
- [Client Architecture](#client-architecture)
- [Data Models](#data-models)
- [Configuration and Settings](#configuration-settings)
- [Error Handling and Logging](#error-handling-logging)

<a id='installation'></a>

## Installation Guide

### 相关页面

相关主题：[Quick Start Guide](#quickstart)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [pyproject.toml](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/pyproject.toml)
- [readme.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/readme.md)
</details>

# Installation Guide

## Overview

This guide covers the installation process for the **Oxylabs AI Studio Python SDK** (`oxylabs-ai-studio`). The SDK provides a Python interface for interacting with Oxylabs AI Studio API services, including AI-Scraper, AI-Crawler, AI-Browser-Agent, and other data extraction tools.

资料来源：[readme.md:1-10]()

## System Requirements

| Requirement | Minimum Version | Notes |
|-------------|-----------------|-------|
| Python | 3.10+ | Earlier versions are not supported |
| Package Manager | pip | Standard Python package installer |
| API Key | Required | Must be obtained from Oxylabs AI Studio |

资料来源：[readme.md:10-11]()

## Prerequisites

Before installing the SDK, ensure your environment meets the following requirements:

### Python Version Check

```bash
python --version
# or
python3 --version
```

The output should show Python 3.10 or higher.

### pip Availability

```bash
pip --version
# or
pip3 --version
```

## Installation Methods

### Standard Installation (Recommended)

The official release version can be installed directly from PyPI using pip:

```bash
pip install oxylabs-ai-studio
```

资料来源：[readme.md:14]()

### Installation from Source

For development or testing purposes, you can install from the source repository:

```bash
git clone https://github.com/oxylabs/oxylabs-ai-studio-py.git
cd oxylabs-ai-studio-py
pip install -e .
```

## Post-Installation Verification

After installation, verify the SDK is properly installed:

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler
from oxylabs_ai_studio.apps.ai_search import AiSearch
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent
from oxylabs_ai_studio.apps.ai_map import AiMap

print("Oxylabs AI Studio SDK imported successfully")
```

If no import errors occur, the installation was successful.

## SDK Components

The SDK includes the following main components:

| Component | Module Path | Purpose |
|-----------|-------------|---------|
| AI Scraper | `oxylabs_ai_studio.apps.ai_scraper` | Scrape website content with AI |
| AI Crawler | `oxylabs_ai_studio.apps.ai_crawler` | Crawl and extract data from sites |
| AI Search | `oxylabs_ai_studio.apps.ai_search` | Perform AI-powered SERP searches |
| Browser Agent | `oxylabs_ai_studio.apps.browser_agent` | Automate browser-based tasks |
| AI Map | `oxylabs_ai_studio.apps.ai_map` | Map website structures |

资料来源：[readme.md:6-7]()

## Quick Start Configuration

After installation, you need to configure your API key to use the SDK:

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

# Initialize with your API key
scraper = AiScraper(api_key="<YOUR_API_KEY>")

# Example usage
result = scraper.scrape(
    url="https://example.com",
    output_format="markdown"
)
```

Replace `<YOUR_API_KEY>` with your actual Oxylabs AI Studio API key.

## Environment Setup Recommendations

### Virtual Environment (Recommended)

For isolated development, use a virtual environment:

```bash
# Create virtual environment
python -m venv ai-studio-env

# Activate on Linux/macOS
source ai-studio-env/bin/activate

# Activate on Windows
ai-studio-env\Scripts\activate

# Install SDK
pip install oxylabs-ai-studio
```

### Using pyproject.toml

If you're managing a project with `pyproject.toml`:

```toml
[project]
name = "oxylabs-ai-studio"
version = "latest"
requires-python = ">=3.10"
dependencies = [
    "oxylabs-ai-studio",
]
```

## Dependencies

The SDK relies on the following core dependencies (automatically installed):

- `httpx` - HTTP client for API requests
- `pydantic` - Data validation using Python type hints
- Standard library modules: `time`, `asyncio`, `logging`

资料来源：[pyproject.toml](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/pyproject.toml)

## Troubleshooting

### Common Issues

| Issue | Solution |
|-------|----------|
| ImportError | Ensure Python 3.10+ is installed |
| AuthenticationError | Verify API key is correct and active |
| TimeoutError | Check network connectivity |
| pip install fails | Try upgrading pip: `pip install --upgrade pip` |

### Upgrade Instructions

To upgrade to the latest version:

```bash
pip install --upgrade oxylabs-ai-studio
```

## Related Documentation

- [Oxylabs AI Studio](https://aistudio.oxylabs.io/) - Official product page
- [API Documentation](https://developers.oxylabs.io/) - Detailed API reference
- [Discord Community](https://discord.gg/Pds3gBmKMH) - Get help from the community

---

<a id='quickstart'></a>

## Quick Start Guide

### 相关页面

相关主题：[Installation Guide](#installation), [AI-Scraper Feature](#ai-scraper), [AI-Crawler Feature](#ai-crawler)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [readme.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/readme.md)
- [agentic_code_guide.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/agentic_code_guide.md)
- [examples/scrape_markdown.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/scrape_markdown.py)
- [examples/crawl_markdown.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/crawl_markdown.py)
- [examples/search_instant.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/search_instant.py)
- [src/oxylabs_ai_studio/apps/ai_scraper.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_scraper.py)
- [src/oxylabs_ai_studio/apps/ai_crawler.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_crawler.py)
</details>

# Quick Start Guide

## Overview

The **Oxylabs AI Studio Python SDK** (`oxylabs-ai-studio`) provides a simple Python interface for interacting with [Oxylabs AI Studio API](https://aistudio.oxylabs.io/) services. This SDK enables developers to integrate AI-powered web scraping, crawling, search, and browser automation capabilities into their Python applications with minimal configuration.

**Key Features:**

- AI-Scraper: Extract structured data from web pages using natural language prompts
- AI-Crawler: Automatically discover and crawl related pages starting from a URL
- AI-Search: Perform SERP (Search Engine Results Page) searches with content extraction
- Browser-Agent: Automate browser actions (clicks, scrolls, navigation) via prompts
- AI-Map: Discover website structure and find pages matching specific keywords

**Requirements:**

| Requirement | Version |
|-------------|---------|
| Python | 3.10+ |
| API Key | Valid Oxylabs AI Studio API key |

资料来源：[readme.md:1-15]()

---

## Installation

Install the SDK using pip:

```bash
pip install oxylabs-ai-studio
```

资料来源：[readme.md:16-19]()

---

## Core Concepts

### Authentication

All applications in the SDK require an API key for authentication. You can obtain an API key from the [Oxylabs AI Studio dashboard](https://aistudio.oxylabs.io/).

### Application Classes

The SDK provides five main application classes, each located in `oxylabs_ai_studio.apps`:

| Application | Class | Purpose |
|-------------|-------|---------|
| AI-Scraper | `AiScraper` | Single-page content extraction |
| AI-Crawler | `AiCrawler` | Multi-page website crawling |
| AI-Search | `AiSearch` | Search engine results extraction |
| Browser-Agent | `BrowserAgent` | Browser automation |
| AI-Map | `AiMap` | Website structure discovery |

### Output Formats

Each application supports multiple output formats:

| Format | Description | Applicable Apps |
|--------|-------------|-----------------|
| `json` | Structured JSON data (requires schema) | All apps |
| `markdown` | Markdown formatted text | All apps |
| `html` | Raw HTML content | Scraper, Browser-Agent |
| `screenshot` | Base64-encoded screenshot | Scraper, Browser-Agent |
| `csv` | Comma-separated values | Scraper, Crawler |
| `toon` | AI-defined format | Scraper, Crawler |

---

## Quick Start Workflows

### Mermaid: SDK Initialization Flow

```mermaid
graph TD
    A[Install SDK] --> B[Import Application Class]
    B --> C[Initialize with API Key]
    C --> D[Call Method<br/>scrape/crawl/search/run]
    D --> E[Receive AiXxxJob Response]
    E --> F[Access result.data]
    
    style A fill:#e1f5fe
    style E fill:#fff3e0
    style F fill:#e8f5e8
```

---

## Usage Examples

### AI-Scraper: Basic Scraping

Scrape a single URL and extract content in Markdown format:

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")

url = "https://sandbox.oxylabs.io/products/1"
result = scraper.scrape(
    url=url,
    output_format="markdown",
    render_javascript=False,
    geo_location="Germany",
)
print(result)
```

资料来源：[examples/scrape_markdown.py:1-14]()

### AI-Scraper: Structured JSON Extraction

Extract structured data using a JSON schema:

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")

schema = scraper.generate_schema(
    prompt="want to parse developer, platform, type, price game title, genre (array) and description"
)
print(f"Generated schema: {schema}")

url = "https://sandbox.oxylabs.io/products/3"
result = scraper.scrape(
    url=url,
    output_format="json",
    schema=schema,
    render_javascript=False,
)
print(result)
```

资料来源：[examples/scrape_generated_schema.py:1-19]()

**Parameters for `AiScraper.scrape`:**

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `url` | `str` | Yes | - | Target URL to scrape |
| `output_format` | `Literal["json", "markdown", "csv", "screenshot", "toon"]` | No | `"markdown"` | Output format |
| `schema` | `dict \| None` | Conditional | `None` | JSON schema (required for `json`, `csv`, `toon`) |
| `render_javascript` | `bool` | No | `False` | Enable JavaScript rendering |
| `geo_location` | `str` | No | `None` | Proxy location (ISO2 or country name) |

资料来源：[readme.md:55-68]()

---

### AI-Crawler: Website Crawling

Crawl a website starting from a specific URL:

```python
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler

crawler = AiCrawler(api_key="<API_KEY>")

url = "https://oxylabs.io"
result = crawler.crawl(
    url=url,
    user_prompt="Find all pages with proxy products pricing",
    output_format="markdown",
    render_javascript=False,
    return_sources_limit=3,
    geo_location="France",
)
print("Results:")
for item in result.data:
    print(item, "\n")
```

资料来源：[examples/crawl_markdown.py:1-18]()

### AI-Crawler: Structured Extraction with Pydantic Schema

Use Pydantic models for type-safe schema definition:

```python
from pydantic import BaseModel, Field
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler

crawler = AiCrawler(api_key="<API_KEY>")

class ProxyPlan(BaseModel):
    name: str = Field(description="The name of the proxy plan")
    price: str = Field(description="The price of the proxy plan")
    features: list[str] = Field(description="The features of the proxy plan")

class ProxyPlans(BaseModel):
    proxy_plans: list[ProxyPlan] = Field(description="The proxy plans")

url = "https://oxylabs.io/"
result = crawler.crawl(
    url=url,
    user_prompt="Find all pages with proxy products pricing",
    output_format="json",
    schema=ProxyPlans.model_json_schema(),
    render_javascript=False,
)
print("Results:\n")
for item in result.data:
    print(item, "\n")
```

资料来源：[examples/crawl_pydantic_schema.py:1-30]()

**Parameters for `AiCrawler.crawl`:**

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `url` | `str` | Yes | - | Starting URL to crawl |
| `user_prompt` | `str` | Yes | - | Natural language prompt to guide extraction |
| `output_format` | `Literal["json", "markdown", "csv", "toon"]` | No | `"markdown"` | Output format |
| `schema` | `dict \| None` | Conditional | `None` | JSON schema (required for `json`, `csv`, `toon`) |
| `render_javascript` | `bool` | No | `False` | Enable JavaScript rendering |
| `return_sources_limit` | `int` | No | `25` | Max number of sources to return |
| `geo_location` | `str` | No | `None` | Proxy location (ISO2 or country name) |
| `max_credits` | `int \| None` | No | `None` | Maximum credits to use |

资料来源：[readme.md:33-52]()

---

### AI-Search: Search Engine Results

Perform a search with full content extraction:

```python
from oxylabs_ai_studio.apps.ai_search import AiSearch

search = AiSearch(api_key="<API_KEY>")

query = "lasagna recipe"
result = search.search(
    query=query,
    limit=5,
    render_javascript=False,
    return_content=True,
)
print(result.data)
```

资料来源：[readme.md:54-67]()

### AI-Search: Instant Search

For fast results without content (up to 10 results):

```python
from oxylabs_ai_studio.apps.ai_search import AiSearch

search = AiSearch(api_key="<API_KEY>")

query = "lasagna recipes"
result = search.instant_search(
    query=query,
    limit=5,
    geo_location="United States",
)
print(result.data)
```

资料来源：[examples/search_instant.py:1-13]()

### AI-Search: Results Without Content

Optimize for speed by disabling content extraction:

```python
from oxylabs_ai_studio.apps.ai_search import AiSearch

search = AiSearch(api_key="<API_KEY>")

query = "lasagna"
result = search.search(
    query=query,
    limit=5,
    render_javascript=False,
    return_content=False,
    geo_location="Italy",
)
print(result.data)
```

资料来源：[examples/search_no_content.py:1-14]()

**Parameters for `AiSearch.search`:**

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `query` | `str` | Yes | - | What to search for |
| `limit` | `int` | No | `10` | Max results (max: 50) |
| `render_javascript` | `bool` | No | `False` | Enable JavaScript rendering |
| `return_content` | `bool` | No | `True` | Include markdown content in results |
| `geo_location` | `str` | No | `None` | ISO 2-letter format or country name |

**Instant Search Parameters:**

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `query` | `str` | Yes | - | The search query |
| `limit` | `int` | No | `10` | Max results (max: 10) |
| `geo_location` | `str` | No | `None` | Google's canonical location name |

> **Note:** When `limit <= 10` and `return_content=False`, the search automatically uses the instant endpoint (`/search/instant`) which returns results immediately without polling.

资料来源：[readme.md:68-95]()

---

## Response Data Models

### Mermaid: Response Flow

```mermaid
graph LR
    A[API Request] --> B{AI Studio API}
    B --> C{Status Check}
    C -->|Processing| D[Poll for completion]
    C -->|Completed| E[Return AiXxxJob]
    C -->|Failed| F[Return with error]
    D --> C
    
    style E fill:#e8f5e8
    style F fill:#ffebee
```

### AiScraperJob

```python
class AiScraperJob(BaseModel):
    run_id: str
    message: str | None = None
    data: str | dict | None
```

| Field | Type | Description |
|-------|------|-------------|
| `run_id` | `str` | Unique identifier for the job |
| `message` | `str \| None` | Error code or status message |
| `data` | `str \| dict \| None` | Result data (type depends on `output_format`) |

资料来源：[agentic_code_guide.md:95-104]()

### BrowserAgentJob

```python
class DataModel(BaseModel):
    type: Literal["json", "markdown", "html", "screenshot", "csv"]
    content: dict[str, Any] | str | None

class BrowserAgentJob(BaseModel):
    run_id: str
    message: str | None = None
    data: DataModel | None = None
```

资料来源：[agentic_code_guide.md:27-35]()

---

## Best Practices

### Rate Limiting

Implement rate limiting in your application to respect the limits associated with your purchased plan. This prevents service disruptions or overuse.

### Retry Mechanism

Implement retry logic for handling failed requests, but include a limit on the number of retries to avoid:

- Infinite loops
- Excessive API calls
- Unnecessary costs

```python
import time

MAX_RETRIES = 3
RETRY_DELAY = 5  # seconds

for attempt in range(MAX_RETRIES):
    try:
        result = scraper.scrape(url=url, output_format="markdown")
        break
    except TimeoutError:
        if attempt < MAX_RETRIES - 1:
            time.sleep(RETRY_DELAY)
        else:
            raise
```

### Schema Generation

Use the built-in `generate_schema()` method to automatically create JSON schemas from natural language prompts:

```python
schema = scraper.generate_schema(
    prompt="want to parse developer, platform, type, price game title, genre (array) and description"
)
```

This approach is recommended over manually writing JSON schemas.

资料来源：[agentic_code_guide.md:7-18]()

### JavaScript Rendering

- Set `render_javascript=False` for static pages to improve performance
- Use `render_javascript=True` for Single Page Applications (SPAs) or pages with dynamic content
- The `AiScraper` also supports `render_javascript="auto"` for automatic detection

---

## Next Steps

After completing this quick start guide, explore these topics:

1. **Advanced Configuration**: Configure timeouts, custom headers, and proxy settings
2. **Error Handling**: Implement robust error handling for production applications
3. **Async Usage**: Use async/await patterns for concurrent operations
4. **Use Cases**: Review [use case examples](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/agentic_code_guide.md#use-cases-examples) for common workflows

---

## Summary

The Oxylabs AI Studio Python SDK provides a streamlined interface for AI-powered web data extraction. With just an API key and a few lines of code, you can:

| Capability | SDK Component | Primary Method |
|------------|---------------|----------------|
| Extract data from single pages | `AiScraper` | `scrape()` |
| Crawl entire websites | `AiCrawler` | `crawl()` |
| Search search engines | `AiSearch` | `search()` / `instant_search()` |
| Automate browser actions | `BrowserAgent` | `run()` / `run_async()` |
| Discover site structure | `AiMap` | `map()` |

---

<a id='ai-scraper'></a>

## AI-Scraper Feature

### 相关页面

相关主题：[AI-Crawler Feature](#ai-crawler), [Data Models](#data-models)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [src/oxylabs_ai_studio/apps/ai_scraper.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_scraper.py)
- [examples/scrape_markdown.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/scrape_markdown.py)
- [examples/scrape_pydantic_schema.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/scrape_pydantic_schema.py)
- [examples/scrape_generated_schema.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/scrape_generated_schema.py)
</details>

# AI-Scraper Feature

## Overview

The **AI-Scraper** is a core feature of the Oxylabs AI Studio Python SDK designed to scrape website content and return extracted data in multiple formats. It leverages AI capabilities to intelligently extract structured or unstructured data from web pages based on natural language prompts or JSON schemas.

### Purpose and Scope

The AI-Scraper provides the following capabilities:

- **Flexible Output Formats**: Supports Markdown, JSON, CSV, and screenshot output
- **Schema-Based Extraction**: Enables structured data extraction using JSON schemas or Pydantic models
- **AI-Powered Parsing**: Uses natural language prompts to guide data extraction
- **JavaScript Rendering**: Supports pages requiring client-side rendering
- **Geo-Location Targeting**: Allows scraping from specific geographic locations

## Architecture

```mermaid
graph TD
    A[User Request] --> B[AiScraper Class]
    B --> C{Output Format}
    C -->|markdown| D[Markdown Parser]
    C -->|json| E[Schema Validator]
    C -->|csv| F[CSV Formatter]
    C -->|screenshot| G[Screenshot Capture]
    D --> H[API Endpoint]
    E --> H
    F --> H
    G --> H
    H --> I[Oxylabs API]
    I --> J[Response Handler]
    J --> K[Structured Data]
```

## Core Components

### AiScraper Class

The main interface for web scraping operations. The class provides both synchronous and asynchronous methods for scraping web content.

**Import Statement:**
```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
```

**Initialization:**
```python
scraper = AiScraper(api_key="<API_KEY>")
```

### Key Methods

| Method | Description | Type |
|--------|-------------|------|
| `scrape()` | Synchronous scraping operation | Sync |
| `scrape_async()` | Asynchronous scraping operation | Async |
| `generate_schema()` | Auto-generate JSON schema from prompt | Helper |

## API Parameters

### Required Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `url` | `str` | Target URL to scrape |
| `output_format` | `Literal["json", "markdown", "csv", "screenshot"]` | Desired output format |

### Optional Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `schema` | `dict \| None` | `None` | JSON schema for structured extraction (required for "json" and "csv" formats) |
| `render_javascript` | `bool` | `False` | Enable JavaScript rendering |
| `geo_location` | `str` | `None` | Proxy location in ISO2 format or country name |

## Usage Patterns

### Basic Markdown Scraping

The simplest use case extracts page content as Markdown without requiring a schema.

**Example** ([examples/scrape_markdown.py](examples/scrape_markdown.py)):
```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")

url = "https://sandbox.oxylabs.io/products/1"
result = scraper.scrape(
    url=url,
    output_format="markdown",
    render_javascript=False,
    geo_location="Germany",
)
print(result)
```

### Schema-Based JSON Extraction

For structured data extraction, provide a JSON schema defining the expected output structure.

**Example** ([examples/scrape_generated_schema.py](examples/scrape_generated_schema.py)):
```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")

schema = scraper.generate_schema(
    prompt="want to parse developer, platform, type, price game title, genre (array) and description"
)
print(f"Generated schema: {schema}")

url = "https://sandbox.oxylabs.io/products/3"
result = scraper.scrape(
    url=url,
    output_format="json",
    schema=schema,
    render_javascript=False,
)
print(result)
```

### Pydantic Model Integration

For type-safe extraction, use Pydantic models which are automatically converted to JSON schemas.

**Example** ([examples/scrape_pydantic_schema.py](examples/scrape_pydantic_schema.py)):
```python
from pydantic import BaseModel
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")

class Game(BaseModel):
    title: str
    genre: list[str]
    developer: str
    platform: str
    game_type: str
    description: str
    price: str
    availability: str

url = "https://sandbox.oxylabs.io/products/1"
result = scraper.scrape(
    url=url,
    output_format="json",
    schema=Game.model_json_schema(),
    render_javascript=False,
)
print(result)
```

## Async Usage

### Async Interface

For high-performance applications, use the async interface:

```python
import asyncio
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")

async def main():
    url = "https://sandbox.oxylabs.io/products/3"
    result = await scraper.scrape_async(
        url=url,
        output_format="json",
        schema={"type": "object", "properties": {"price": {"type": "string"}}, "required": []},
        render_javascript=False,
    )
    print(result)

if __name__ == "__main__":
    asyncio.run(main())
```

## Response Data Model

### AiScraperJob Structure

| Field | Type | Description |
|-------|------|-------------|
| `run_id` | `str` | Unique identifier for the scraping job |
| `message` | `str \| None` | Status message or error description |
| `data` | `dict \| str \| None` | Extracted data based on output format |

### Data Type by Output Format

| Output Format | Data Type | Description |
|---------------|-----------|-------------|
| `json` | `dict` | Parsed JSON object |
| `markdown` | `str` | HTML content converted to Markdown |
| `csv` | `str` | Comma-separated values string |
| `screenshot` | `str` | Base64-encoded image data |

## Schema Generation

The AI-Scraper provides a `generate_schema()` helper method that uses AI to create appropriate JSON schemas from natural language prompts.

```python
schema = scraper.generate_schema(
    prompt="proxy plans which have name, price, and features"
)
```

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `prompt` | `str` | Natural language description of desired data structure |

**Returns:** `dict` - A valid JSON schema object

## Workflow Diagram

```mermaid
sequenceDiagram
    participant User
    participant AiScraper
    participant API
    participant Response

    User->>AiScraper: scrape(url, output_format, schema)
    AiScraper->>AiScraper: Validate parameters
    AiScraper->>API: POST request with payload
    API->>API: Process scraping request
    API->>Response: Return extracted data
    Response->>AiScraper: AiScraperJob response
    AiScraper->>User: Return result object
```

## Configuration Options

### JavaScript Rendering

The `render_javascript` parameter controls browser rendering behavior:

| Value | Behavior |
|-------|----------|
| `False` | No JavaScript rendering (default) |
| `True` | Always render JavaScript |
| `"auto"` | Service automatically detects if rendering is needed |

### Geo-Location

Specify geographic location for proxy-based scraping:

```python
result = scraper.scrape(
    url="https://example.com",
    geo_location="Germany",  # Country name
    # or "DE" for ISO2 format
)
```

## Error Handling

When a scraping operation fails, the response will include:

1. `run_id` - The job identifier for troubleshooting
2. `message` - Error description
3. `data` - `None` when an error occurs

Always check the `message` field before accessing `data`:

```python
result = scraper.scrape(url=url, output_format="json", schema=schema)
if result.message:
    print(f"Error: {result.message}")
else:
    print(result.data)
```

## Best Practices

1. **Use Appropriate Schemas**: Always provide a valid JSON schema when using `output_format="json"` or `output_format="csv"`
2. **Enable JS Rendering When Needed**: Set `render_javascript=True` for SPAs and dynamic content
3. **Specify Geo-Location**: Use `geo_location` parameter when location-specific content is required
4. **Handle Errors Gracefully**: Always check the `message` field in the response

## Summary

The AI-Scraper feature provides a powerful, flexible interface for web content extraction within the Oxylabs AI Studio ecosystem. With support for multiple output formats, schema-based extraction, and both synchronous and asynchronous operation modes, it serves as a versatile tool for various web scraping use cases.

---

<a id='ai-crawler'></a>

## AI-Crawler Feature

### 相关页面

相关主题：[AI-Scraper Feature](#ai-scraper), [AI-Map Feature](#ai-map)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this documentation:

- [src/oxylabs_ai_studio/apps/ai_crawler.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_crawler.py)
- [examples/crawl_markdown.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/crawl_markdown.py)
- [examples/crawl_pydantic_schema.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/crawl_pydantic_schema.py)
- [examples/crawl_generated_schema.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/crawl_generated_schema.py)
- [readme.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/readme.md)
</details>

# AI-Crawler Feature

## Overview

The AI-Crawler is a web crawling and content extraction module within the Oxylabs AI Studio Python SDK. It enables intelligent, AI-powered website crawling with natural language prompts to guide content extraction. The crawler navigates starting URLs, discovers relevant pages based on user-defined prompts, and returns structured or unstructured data in multiple formats.

**Key Characteristics:**
- Natural language-based extraction guidance via `user_prompt`
- Multi-format output support (JSON, Markdown, CSV, Toon)
- JavaScript rendering capability for dynamic web pages
- Geographic localization through proxy positioning
- Schema-driven structured extraction with optional automatic schema generation
- Polling-based async job completion handling with configurable timeout

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py:1-31]()

## Architecture

### Class Hierarchy

```mermaid
graph TD
    A[OxyStudioAIClient] --> B[AiCrawler]
    B --> C[AiCrawlerJob]
    
    B1[BaseModel] --> C
```

The `AiCrawler` class inherits from `OxyStudioAIClient`, which provides the underlying API client functionality including authentication, request handling, and response parsing.

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py:23-31]()

### Data Models

```python
class AiCrawlerJob(BaseModel):
    run_id: str
    message: str | None = None
    data: list[dict[str, Any]] | list[str] | None = None
```

| Field | Type | Description |
|-------|------|-------------|
| `run_id` | `str` | Unique identifier for the crawl job |
| `message` | `str \| None` | Error code or status message if job failed |
| `data` | `list[dict[str, Any]] \| list[str] \| None` | Extracted content based on output format |

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py:27-30]()

## Configuration Constants

| Constant | Value | Purpose |
|----------|-------|---------|
| `CRAWLER_TIMEOUT_SECONDS` | `600` (10 minutes) | Maximum time to wait for job completion |
| `POLL_INTERVAL_SECONDS` | `5` | Interval between status checks |
| `POLL_MAX_ATTEMPTS` | `120` | Maximum polling attempts before timeout |

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py:12-14]()

## Core Methods

### `crawl()`

The primary method for initiating a crawl operation.

```python
def crawl(
    self,
    url: str,
    user_prompt: str,
    output_format: Literal["json", "markdown", "csv", "toon"] = "markdown",
    schema: dict[str, Any] | None = None,
    render_javascript: bool = False,
    return_sources_limit: int = 25,
    geo_location: str | None = None,
    max_credits: int | None = None,
) -> AiCrawlerJob
```

#### Parameters

| Parameter | Type | Default | Required | Description |
|-----------|------|---------|----------|-------------|
| `url` | `str` | - | Yes | Starting URL to crawl |
| `user_prompt` | `str` | - | Yes | Natural language prompt to guide extraction |
| `output_format` | `Literal["json", "markdown", "csv", "toon"]` | `"markdown"` | No | Desired output format |
| `schema` | `dict[str, Any] \| None` | `None` | Conditional | JSON schema for structured extraction (required for `json`, `csv`, `toon` formats) |
| `render_javascript` | `bool` | `False` | No | Enable JavaScript rendering |
| `return_sources_limit` | `int` | `25` | No | Maximum number of sources to return |
| `geo_location` | `str \| None` | `None` | No | Proxy location in ISO2 format or country name |
| `max_credits` | `int \| None` | `None` | No | Maximum credits to consume |

#### Validation Rules

```python
if output_format in ["json", "csv", "toon"] and schema is None:
    raise ValueError(
        "openapi_schema is required when output_format is json, csv or toon.",
    )
```

When using `json`, `csv`, or `toon` output formats, a valid JSON schema must be provided. Markdown format does not require a schema.

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py:47-52]()

### `generate_schema()`

Automatically generates a JSON schema based on a natural language prompt.

```python
def generate_schema(self, prompt: str) -> dict[str, Any] | None
```

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `prompt` | `str` | Yes | Natural language description of desired data structure |

**Returns:** A dictionary containing the generated JSON schema.

**Process Flow:**
1. Sends prompt to `/crawl/generate-params` endpoint
2. Validates response status code (must be 200)
3. Parses and returns schema response

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py:89-103]()

## Workflow

```mermaid
sequenceDiagram
    participant User
    participant AiCrawler
    participant API
    participant PollService
    
    User->>AiCrawler: crawl(url, user_prompt, output_format, schema)
    AiCrawler->>API: POST /crawl/run
    API-->>AiCrawler: run_id
    AiCrawler->>PollService: Start polling
    PollService->>API: GET /crawl/run/data?run_id=xxx
    alt Status: processing
        API-->>PollService: 202 Accepted
        PollService->>PollService: wait(POLL_INTERVAL_SECONDS)
        PollService->>API: GET /crawl/run/data
    end
    alt Status: completed
        API-->>PollService: 200 + data
        PollService-->>User: AiCrawlerJob(data)
    else Status: failed
        API-->>PollService: 200 + failed status
        PollService-->>User: AiCrawlerJob(message=error)
    else Timeout
        PollService-->>User: TimeoutError
    end
```

### Job Completion States

| Status | Response | Action |
|--------|----------|--------|
| `processing` | `202` | Continue polling at `POLL_INTERVAL_SECONDS` |
| `completed` | `200` with `data` | Return `AiCrawlerJob` with extracted data |
| `failed` | `200` with `failed` | Return `AiCrawlerJob` with error message |
| Timeout | After 10 minutes | Raise `TimeoutError` |
| KeyboardInterrupt | User cancels | Log and re-raise `KeyboardInterrupt` |

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py:54-85]()

## Usage Examples

### Basic Markdown Crawl

```python
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler

crawler = AiCrawler(api_key="<API_KEY>")

url = "https://oxylabs.io"
result = crawler.crawl(
    url=url,
    user_prompt="Find all pages with proxy products pricing",
    output_format="markdown",
    render_javascript=False,
    return_sources_limit=3,
    geo_location="France",
)
print("Results:")
for item in result.data:
    print(item, "\n")
```

资料来源：[examples/crawl_markdown.py:1-18]()

### JSON Extraction with Generated Schema

```python
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler

crawler = AiCrawler(api_key="<API_KEY>")

schema = crawler.generate_schema(
    prompt="proxy plans which have name, price, and features",
)
print("schema: ", schema)

url = "https://oxylabs.io"
result = crawler.crawl(
    url=url,
    user_prompt="Find all pages with proxy products pricing",
    output_format="json",
    schema=schema,
    render_javascript=False,
)
print("Results:")
for item in result.data:
    print(item, "\n")
```

资料来源：[examples/crawl_generated_schema.py:1-24]()

### Structured Extraction with Pydantic Schema

```python
from pydantic import BaseModel, Field
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler

crawler = AiCrawler(api_key="<API_KEY>")

class ProxyPlan(BaseModel):
    name: str = Field(description="The name of the proxy plan")
    price: str = Field(description="The price of the proxy plan")
    features: list[str] = Field(description="The features of the proxy plan")

class ProxyPlans(BaseModel):
    proxy_plans: list[ProxyPlan] = Field(description="The proxy plans")

url = "https://oxylabs.io/"
result = crawler.crawl(
    url=url,
    user_prompt="Find all pages with proxy products pricing",
    output_format="json",
    schema=ProxyPlans.model_json_schema(),
    render_javascript=False,
)
```

资料来源：[examples/crawl_pydantic_schema.py:1-28]()

## Output Formats

| Format | Schema Required | Data Type in `AiCrawlerJob.data` | Use Case |
|--------|-----------------|----------------------------------|----------|
| `markdown` | No | `list[str]` | Content summarization, human-readable output |
| `json` | Yes | `list[dict[str, Any]]` | Structured data processing, API integration |
| `csv` | Yes | `list[dict[str, Any]]` | Spreadsheet imports, tabular analysis |
| `toon` | Yes | `list[dict[str, Any]]` | Specialized structured format |

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py:41-46]()

## Geographic Localization

The `geo_location` parameter supports multiple formats:

| Format | Example | Description |
|--------|---------|-------------|
| ISO 2-letter code | `"US"` | US, GB, DE, FR, etc. |
| Country canonical name | `"United States"` | Capitalized full name |
| Coordinate formats | See SERP Localization docs | Advanced localization |

资料来源：[readme.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/readme.md)

## Error Handling

### Schema Validation Error

```python
# This raises ValueError
result = crawler.crawl(
    url="https://example.com",
    user_prompt="Extract prices",
    output_format="json",
    schema=None,  # Missing schema
)
# ValueError: openapi_schema is required when output_format is json, csv or toon.
```

### Timeout Handling

```python
try:
    result = crawler.crawl(
        url="https://example.com",
        user_prompt="Extract all products",
        output_format="markdown",
    )
except TimeoutError as e:
    print(f"Crawl failed: {e}")
    # Handle timeout - consider retrying with reduced scope
```

### Keyboard Interrupt

When a user cancels the operation mid-polling, the crawler logs the cancellation and re-raises the `KeyboardInterrupt`:

```python
except KeyboardInterrupt:
    logger.info("[Cancelled] Crawling was cancelled by user.")
    raise KeyboardInterrupt from None
```

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py:80-82]()

## Best Practices

### 1. Set Appropriate Source Limits

```python
# Limit to most relevant sources
result = crawler.crawl(
    url="https://ecommerce.example.com",
    user_prompt="Product pages with pricing",
    return_sources_limit=10,  # Balance between coverage and performance
)
```

### 2. Use Specific Prompts

```python
# Good: Specific and actionable
result = crawler.crawl(
    url="https://example.com",
    user_prompt="Find all blog posts published in 2024 with author names and publication dates",
)

# Less effective: Too vague
result = crawler.crawl(
    url="https://example.com",
    user_prompt="Find stuff",
)
```

### 3. Handle JavaScript Rendering Selectively

```python
# Only enable if necessary - adds latency
result = crawler.crawl(
    url="https://spa.example.com",
    user_prompt="Extract dashboard metrics",
    render_javascript=True,  # Required for SPAs
)
```

### 4. Credit Management

```python
# Set maximum credits for cost control
result = crawler.crawl(
    url="https://example.com",
    user_prompt="Extract product data",
    max_credits=100,  # Prevents runaway costs
)
```

## Related Features

| Feature | Module | Purpose |
|---------|--------|---------|
| AI-Scraper | `AiScraper` | Single-page extraction without crawling |
| AI-Search | `AiSearch` | Search engine result extraction |
| AI-Map | `AiMap` | URL discovery and site mapping |
| Browser-Agent | `BrowserAgent` | Interactive browser automation |

资料来源：[readme.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/readme.md)

---

<a id='ai-search'></a>

## AI-Search Feature

### 相关页面

相关主题：[AI-Scraper Feature](#ai-scraper), [Client Architecture](#client-architecture)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/oxylabs_ai_studio/apps/ai_search.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_search.py)
- [examples/search_with_content.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/search_with_content.py)
- [examples/search_no_content.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/search_no_content.py)
- [examples/search_instant.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/search_instant.py)
</details>

# AI-Search Feature

## Overview

The AI-Search feature provides a programmatic interface for performing AI-powered search engine results page (SERP) queries. It enables users to search for information and retrieve results with optional full content extraction, JavaScript rendering support, and geographic localization.

The feature offers two search modes:
- **Standard Search**: A polling-based approach for retrieving comprehensive search results with content
- **Instant Search**: A lightweight endpoint optimized for quick results (up to 10 results) without content

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py:1-50]()

## Architecture

### Class Hierarchy

The AI-Search feature is built on the `OxyStudioAIClient` base class, which provides HTTP client functionality and API communication capabilities.

```python
class AiSearch(OxyStudioAIClient):
    """AI Search app."""
```

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py:37-38]()

### Module Structure

| Component | File | Responsibility |
|-----------|------|----------------|
| AiSearch | ai_search.py | Main synchronous interface |
| AiSearchJob | ai_search.py | Response data model |
| SearchResult | ai_search.py | Individual result data model |

## Data Models

### SearchResult

Represents a single search result entry.

| Field | Type | Description |
|-------|------|-------------|
| url | str | The URL of the search result |
| title | str | The title of the search result |
| description | str | The description/snippet of the search result |
| content | str \| None | Full content of the page (when return_content=True) |

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py:22-27]()

### AiSearchJob

Represents the complete search job response.

| Field | Type | Description |
|-------|------|-------------|
| run_id | str | Unique identifier for the search job |
| message | str \| None | Status message or error code |
| data | list[SearchResult] \| None | List of search results |

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py:29-31]()

## API Methods

### Synchronous Interface

#### search()

Performs a standard search with polling until results are available.

```python
def search(
    self,
    query: str,
    limit: int = 10,
    render_javascript: bool = False,
    return_content: bool = True,
    geo_location: str | None = None,
) -> AiSearchJob
```

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query | str | required | The search query string |
| limit | int | 10 | Maximum number of results (max: 50) |
| render_javascript | bool | False | Enable JavaScript rendering |
| return_content | bool | True | Include full content in results |
| geo_location | str \| None | None | Geographic location for localized results |

**Return Type:** `AiSearchJob`

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py:43-55]()

#### instant_search()

Performs a fast search using the instant endpoint without polling.

```python
def instant_search(
    self,
    query: str,
    limit: int = 10,
    geo_location: str | None = None,
) -> AiSearchJob
```

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| query | str | required | The search query string |
| limit | int | 10 | Maximum number of results (max: 10) |
| geo_location | str \| None | None | Geographic location for localized results |

**Note:** Instant search automatically bypasses the polling mechanism when `limit <= 10` and `return_content=False`.

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py:95-106]()

### Asynchronous Interface

#### search_async()

Async version of the standard search method.

```python
async def search_async(
    self,
    query: str,
    limit: int = 10,
    render_javascript: bool = False,
    return_content: bool = True,
    geo_location: str | None = None,
) -> AiSearchJob
```

#### instant_search_async()

Async version of the instant search method.

```python
async def instant_search_async(
    self,
    query: str,
    limit: int = 10,
    geo_location: str | None = None,
) -> AiSearchJob
```

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py:108-148]()

## Workflow and State Management

### Standard Search Polling Flow

```mermaid
graph TD
    A[Start search] --> B[Call /search/run API]
    B --> C[Extract run_id]
    C --> D[Call /search/run/data API]
    D --> E{Status Check}
    E -->|202 Pending| F[Wait POLL_INTERVAL_SECONDS]
    E -->|200 Completed| G[Return AiSearchJob with data]
    E -->|200 Failed| H[Return AiSearchJob with error]
    F --> D
    H --> I[End with error]
    G --> J[End success]
    
    style A fill:#e1f5ff
    style G fill:#c8e6c9
    style H fill:#ffcdd2
```

### Instant Search Flow

```mermaid
graph TD
    A[Start instant_search] --> B[Call /search/instant API]
    B --> C{Status 200?}
    C -->|Yes| D[Parse response JSON]
    C -->|No| E[Raise Exception]
    D --> F[Return AiSearchJob]
    F --> G[End success]
    E --> H[End with error]
    
    style A fill:#e1f5ff
    style F fill:#c8e6c9
    style E fill:#ffcdd2
```

### Endpoint Selection Logic

```mermaid
graph TD
    A[search called] --> B{limit <= 10?}
    B -->|Yes| C{return_content == False?}
    B -->|No| D[Use standard /search/run]
    C -->|Yes| E[Use instant /search/instant]
    C -->|No| D
    E --> F[Return immediately]
    D --> G[Start polling]
    G --> H{Status completed?}
    H -->|Yes| I[Return results]
    H -->|No| J{Status failed?}
    J -->|Yes| K[Return with error]
    J -->|No| L[Continue polling]
    L --> G
    
    style A fill:#e1f5ff
    style E fill:#c8e6c9
    style K fill:#ffcdd2
```

## Configuration Constants

| Constant | Value | Description |
|----------|-------|-------------|
| SEARCH_TIMEOUT_SECONDS | 180 (60 * 3) | Maximum time to wait for search completion |
| POLL_INTERVAL_SECONDS | 5 | Time between polling attempts |
| POLL_MAX_ATTEMPTS | 36 | Maximum number of polling attempts |

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py:11-13]()

## Usage Examples

### Search with Content

Retrieves search results including full page content:

```python
from oxylabs_ai_studio.apps.ai_search import AiSearch

search = AiSearch(api_key="<API_KEY>")

query = "lasagna recipe"
result = search.search(
    query=query,
    limit=5,
    render_javascript=False,
    return_content=True,
)
print(result.data)
```

资料来源：[examples/search_with_content.py:1-15]()

### Search Without Content

Performs a lightweight search returning only URL, title, and description:

```python
from oxylabs_ai_studio.apps.ai_search import AiSearch

search = AiSearch(api_key="<API_KEY>")

query = "lasagna"
result = search.search(
    query=query,
    limit=5,
    render_javascript=False,
    return_content=False,
    geo_location="Italy",
)
print(result.data)
```

资料来源：[examples/search_no_content.py:1-17]()

### Instant Search

Fast search for up to 10 results with geographic localization:

```python
from oxylabs_ai_studio.apps.ai_search import AiSearch

search = AiSearch(api_key="<API_KEY>")

query = "lasagna recipes"
result = search.instant_search(
    query=query,
    limit=5,
    geo_location="United States",
)
print(result.data)
```

资料来源：[examples/search_instant.py:1-14]()

## Geographic Localization

The `geo_location` parameter supports multiple formats:

| Format | Example |
|--------|---------|
| ISO 2-letter code | "US", "DE", "FR" |
| Country canonical name | "United States", "Germany", "France" |
| Coordinate formats | Supported per SERP Localization docs |

Supported locations are documented at: [SERP Localization](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/localization/serp-localization)

## Error Handling

| Scenario | Behavior |
|----------|----------|
| Empty query | Raises `ValueError("query is required")` |
| API returns non-200 status | Raises `Exception` with response text |
| Search timeout | Raises `TimeoutError` |
| Keyboard interrupt | Logs cancellation and re-raises |

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py:77-82]()

## Key Implementation Details

### Request Body Construction

Both search methods construct a standardized request body:

```python
body = {
    "query": query,
    "limit": limit,
    "render_javascript": render_javascript,
    "return_content": return_content,
    "geo_location": geo_location,
}
```

### API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| /search/run | POST | Create a new search job |
| /search/run/data | GET | Poll for search results |
| /search/instant | POST | Execute instant search |

### Timeout Calculation

```python
POLL_MAX_ATTEMPTS = SEARCH_TIMEOUT_SECONDS // POLL_INTERVAL_SECONDS
# 180 // 5 = 36 attempts

---

<a id='ai-map'></a>

## AI-Map Feature

### 相关页面

相关主题：[AI-Crawler Feature](#ai-crawler), [Client Architecture](#client-architecture)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/oxylabs_ai_studio/apps/ai_map.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_map.py)
- [examples/ai_map.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/ai_map.py)
</details>

# AI-Map Feature

## Overview

The AI-Map feature is a URL discovery and site mapping tool within the Oxylabs AI Studio Python SDK. It enables users to explore website structures by mapping URLs based on specified keywords, crawl depth, and filtering criteria. The feature automatically discovers URLs from sitemaps and linked pages, returning a structured list of discovered endpoints that match user-defined search parameters.

AI-Map serves as the first step in many web scraping workflows, helping users understand the structure of a target website before proceeding with detailed content extraction using tools like AiCrawler or AiScraper. 资料来源：[readme.md](readme.md)

## Core Functionality

The `AiMap` class provides a single primary method: `map()`. This method accepts a comprehensive configuration payload that controls URL discovery behavior. The feature supports:

- **Keyword-based filtering**: Filter discovered URLs by search keywords or natural language prompts
- **Crawl depth control**: Limit how deep the mapping exploration goes (1-5 levels)
- **Result limiting**: Cap the total number of URLs returned
- **Geographic targeting**: Discover URLs with specific geo-location configurations
- **JavaScript rendering**: Enable JS rendering for dynamically loaded links
- **Sitemap integration**: Include or exclude sitemap-based URL discovery
- **Domain scope control**: Allow or restrict subdomains and external domains

资料来源：[examples/ai_map.py](examples/ai_map.py)

## Architecture

```mermaid
graph TD
    A[User calls ai_map.map payload] --> B[AiMap.map method]
    B --> C[Build request payload]
    C --> D[POST to /map endpoint]
    D --> E{Response status?}
    E -->|pending| F[Poll for completion]
    E -->|completed| G[Return AiMapJob]
    E -->|failed| H[Return error]
    F --> E
    G --> I[Extract result.data]
    H --> J[Raise exception]
    
    style A fill:#e1f5ff
    style G fill:#c8e6c9
    style J fill:#ffcdd2
```

## Class Reference

### AiMap

**Module**: `oxylabs_ai_studio.apps.ai_map`

**Constructor**:

```python
AiMap(api_key: str)
```

| Parameter | Type | Description |
|-----------|------|-------------|
| api_key | str | Oxylabs API key for authentication (required) |

资料来源：[examples/ai_map.py](examples/ai_map.py)

### map() Method

**Signature**:

```python
def map(self, **payload) -> AiMapJob
```

**Parameters Table**:

| Parameter | Type | Default | Required | Description |
|-----------|------|---------|----------|-------------|
| url | str | - | Yes | Starting URL or domain to map |
| search_keywords | list[str] | None | No | Keywords for URL path filtering |
| user_prompt | str \| None | None | No | Natural language prompt for keyword search. Can be used together with 'search_keywords' or standalone |
| max_crawl_depth | int | 1 | No | Maximum crawl depth (range: 1-5) |
| limit | int | 25 | No | Maximum number of URLs to return |
| geo_location | str | None | No | Proxy location in ISO2 format or country canonical name |
| render_javascript | bool | False | No | Enable JavaScript rendering for dynamic content |
| include_sitemap | bool | True | No | Include sitemap as a seed source for URL discovery |
| max_credits | int \| None | None | No | Maximum credits to use for this operation |
| allow_subdomains | bool | False | No | Allow mapping of subdomain URLs |
| allow_external_domains | bool | False | No | Allow mapping of external domain URLs |

资料来源：[readme.md](readme.md)

## Usage Examples

### Basic URL Mapping

```python
from oxylabs_ai_studio.apps.ai_map import AiMap

ai_map = AiMap(api_key="<API_KEY>")
payload = {
    "url": "https://oxylabs.io",
    "search_keywords": ["blog"],
    "max_crawl_depth": 3,
    "limit": 50,
    "render_javascript": False,
    "include_sitemap": True,
    "max_credits": None,
    "allow_subdomains": False,
    "allow_external_domains": False,
}
result = ai_map.map(**payload)
print(result.data)
```

资料来源：[examples/ai_map.py](examples/ai_map.py)

### Mapping Career Pages

```python
from oxylabs_ai_studio.apps.ai_map import AiMap

ai_map = AiMap(api_key="<API_KEY>")
payload = {
    "url": "https://career.oxylabs.io",
    "search_keywords": ["career", "jobs", "vacancy"],
    "user_prompt": "job ad pages",
    "max_crawl_depth": 2,
    "limit": 10,
    "geo_location": "Germany",
    "render_javascript": False,
    "include_sitemap": True,
    "max_credits": None,
    "allow_subdomains": False,
    "allow_external_domains": False,
}
result = ai_map.map(**payload)
print(result.data)
```

资料来源：[readme.md](readme.md)

## Response Model

### AiMapJob

| Field | Type | Description |
|-------|------|-------------|
| run_id | str | Unique identifier for this mapping job |
| message | str \| None | Status message or error code |
| data | list \| None | Discovered URLs matching the search criteria |

## Workflow Diagram: Complete Scraping Pipeline

```mermaid
graph LR
    A[Define target domain] --> B[Use AiMap to discover URLs]
    B --> C{URLs discovered?}
    C -->|Yes| D[Filter and select URLs]
    C -->|No| E[Adjust keywords/depth]
    E --> B
    D --> F[Use AiCrawler to crawl content]
    F --> G{Detailed extraction needed?}
    G -->|Yes| H[Use AiScraper per URL]
    G -->|No| I[Process crawled data]
    H --> I
    I --> J[Store/Analyze results]
    
    style B fill:#fff9c4
    style F fill:#c8e6c9
    style H fill:#c8e6c9
```

## Parameter Interaction

| Parameter | Affects | Interaction Notes |
|-----------|---------|-------------------|
| url | All | Root domain determines scope of mapping |
| max_crawl_depth | API calls, credits | Higher depth increases API usage and discovery scope |
| limit | Result size | Combined with depth to control total URL count |
| search_keywords | Filter accuracy | More specific keywords reduce false positives |
| user_prompt | AI interpretation | Works synergistically with search_keywords |
| include_sitemap | Initial URL seed | When True, sitemap URLs are added to discovery queue |
| geo_location | Content variant | URLs may vary based on geo-targeted content |
| allow_subdomains | Scope expansion | When True, expands discovery beyond main domain |

## Best Practices

1. **Start with low crawl depth**: Begin with `max_crawl_depth=1` to understand basic site structure before expanding
2. **Use specific keywords**: Combine `search_keywords` with `user_prompt` for precise URL filtering
3. **Set appropriate limits**: Use `limit` to prevent excessive API usage and manage response sizes
4. **Enable sitemap**: Keep `include_sitemap=True` for comprehensive initial URL discovery
5. **Consider geo-location**: If targeting region-specific pages, specify `geo_location` in the initial mapping

## Common Use Cases

| Use Case | Recommended Configuration |
|----------|--------------------------|
| Blog post discovery | `{"search_keywords": ["blog", "article"], "max_crawl_depth": 2}` |
| E-commerce product pages | `{"search_keywords": ["product", "shop"], "max_crawl_depth": 3}` |
| Documentation site mapping | `{"include_sitemap": True, "max_crawl_depth": 4}` |
| Job listing discovery | `{"search_keywords": ["jobs", "careers", "vacancy"], "max_crawl_depth": 2}` |
| News article aggregation | `{"search_keywords": ["news", "article"], "limit": 100}` |

## Integration with Other Features

The AI-Map feature is designed to work as part of a larger scraping pipeline. After discovering URLs, users typically proceed with:

1. **AiCrawler**: For bulk content extraction from discovered URLs
2. **AiScraper**: For detailed structured data extraction from individual pages
3. **BrowserAgent**: For interactive browsing tasks requiring user-like navigation

资料来源：[readme.md](readme.md), [agentic_code_guide.md](agentic_code_guide.md)

---

<a id='browser-agent'></a>

## Browser Agent Feature

### 相关页面

相关主题：[AI-Scraper Feature](#ai-scraper), [Data Models](#data-models)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/oxylabs_ai_studio/apps/browser_agent.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)
- [examples/browser_agent.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/browser_agent.py)
- [readme.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/readme.md)
- [agentic_code_guide.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/agentic_code_guide.md)
</details>

# Browser Agent Feature

## Overview

The Browser Agent is a powerful browser automation tool within the Oxylabs AI Studio Python SDK that enables programmatic control of web browsers to perform complex actions such as clicking, scrolling, navigating, and extracting data from dynamic web pages. Unlike traditional scraping methods, the Browser Agent accepts natural language prompts to guide its behavior, making it particularly effective for websites that require JavaScript rendering or user interaction.

The feature serves as a bridge between high-level natural language instructions and low-level browser automation, abstracting away the complexities of web interaction while maintaining flexibility for various use cases. It integrates seamlessly with other AI Studio components like schema generation to provide structured data extraction capabilities.

## Architecture

### Component Overview

The Browser Agent feature is built on a client-server architecture where the Python SDK acts as a thin client communicating with Oxylabs' cloud-based browser automation infrastructure.

```mermaid
graph TD
    A[BrowserAgent Python Client] --> B[AI Studio API Gateway]
    B --> C[Browser Agent Service]
    C --> D[Browser Instance Pool]
    D --> E[Target Website]
    F[Schema Generation Service] --> B
```

### Key Components

| Component | Location | Responsibility |
|-----------|----------|----------------|
| `BrowserAgent` | `src/oxylabs_ai_studio/apps/browser_agent.py` | Main client class for browser automation |
| API Client | Base HTTP client | Handles HTTP communication with API |
| Polling Mechanism | Built-in | Monitors job status until completion |
| Schema Generator | Built-in | Creates OpenAPI schemas from prompts |

### Class Hierarchy

```
BaseClient
└── BrowserAgent
    ├── run()
    ├── run_async()
    ├── generate_schema()
    ├── generate_schema_async()
    ├── call_api()
    ├── call_api_async()
    └── get_client() / async_client()
```

## Data Models

### BrowserAgentJob

The primary output model returned by Browser Agent operations.

```python
class DataModel(BaseModel):
    type: Literal["json", "markdown", "html", "screenshot", "csv"]
    content: dict[str, Any] | str | None

class BrowserAgentJob(BaseModel):
    run_id: str
    message: str | None = None
    data: DataModel | None = None
```

| Field | Type | Description |
|-------|------|-------------|
| `run_id` | `str` | Unique identifier for the browser agent job |
| `message` | `str \| None` | Error message or status information if job failed |
| `data` | `DataModel \| None` | Contains the extracted data with type and content |

### DataModel Fields

| Field | Type | Description |
|-------|------|-------------|
| `type` | `Literal["json", "markdown", "html", "screenshot", "csv"]` | Format of the extracted content |
| `content` | `dict[str, Any] \| str \| None` | The actual extracted data |

## API Reference

### BrowserAgent Class

**Import Statement:**
```python
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent
```

**Initialization:**
```python
browser_agent = BrowserAgent(api_key="<API_KEY>")
```

### Method: `run()`

Synchronous method to execute browser agent tasks.

**Signature:**
```python
def run(
    self,
    url: str,
    user_prompt: str,
    output_format: Literal["json", "markdown"] = "markdown",
    schema: dict | None = None,
    geo_location: str | None = None,
    user_agent: str | None = None,
    render_javascript: bool | str = "auto",
) -> BrowserAgentJob
```

**Parameters:**

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `url` | `str` | Yes | - | Target URL to browse and interact with |
| `user_prompt` | `str` | Yes | - | Natural language prompt describing the task to perform |
| `output_format` | `Literal["json", "markdown"]` | No | `"markdown"` | Desired output format for extracted data |
| `schema` | `dict \| None` | Conditional | `None` | OpenAPI JSON schema for structured extraction (required when `output_format="json"`) |
| `geo_location` | `str \| None` | No | `None` | Proxy location in ISO2 format or country canonical name |
| `user_agent` | `str \| None` | No | `None` | Custom User-Agent request header |
| `render_javascript` | `bool \| str` | No | `"auto"` | JavaScript rendering option; can be `True`, `False`, or `"auto"` |

**Returns:** `BrowserAgentJob` object containing the job result

**Example Usage:**
```python
browser_agent = BrowserAgent(api_key="<API_KEY>")

prompt = "Find if there is game 'super mario odyssey' in the store."
url = "https://sandbox.oxylabs.io/"
result = browser_agent.run(
    url=url,
    user_prompt=prompt,
    output_format="json",
    schema={"type": "object", "properties": {"page_url": {"type": "string"}}, "required": []},
)
print(result.data)
```

资料来源: [src/oxylabs_ai_studio/apps/browser_agent.py:1-200](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)

### Method: `run_async()`

Asynchronous method to execute browser agent tasks without blocking.

**Signature:**
```python
async def run_async(
    self,
    url: str,
    user_prompt: str,
    output_format: Literal["json", "markdown"] = "markdown",
    schema: dict | None = None,
    geo_location: str | None = None,
) -> BrowserAgentJob
```

**Example Usage:**
```python
import asyncio
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent

browser_agent = BrowserAgent(api_key="<API_KEY>")

async def main():
    prompt = "Find if there is game 'super mario odyssey' in the store."
    url = "https://sandbox.oxylabs.io/"
    result = await browser_agent.run_async(
        url=url,
        user_prompt=prompt,
        output_format="json",
        schema={"type": "object", "properties": {"page_url": {"type": "string"}}, "required": []},
    )
    print(result.data)

asyncio.run(main())
```

资料来源: [src/oxylabs_ai_studio/apps/browser_agent.py:200-280](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)

### Method: `generate_schema()`

Generates a JSON schema for structured data extraction based on a natural language prompt.

**Signature:**
```python
def generate_schema(self, prompt: str) -> dict[str, Any] | None
```

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `prompt` | `str` | Yes | Natural language description of the data structure to extract |

**Returns:** Dictionary containing the generated OpenAPI schema, or `None` if generation fails

**Example Usage:**
```python
browser_agent = BrowserAgent(api_key="<API_KEY>")

schema = browser_agent.generate_schema(
    prompt="game name, platform, review stars and price"
)
print("schema: ", schema)
```

资料来源: [src/oxylabs_ai_studio/apps/browser_agent.py:180-195](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)

### Method: `generate_schema_async()`

Asynchronous version of `generate_schema()`.

**Signature:**
```python
async def generate_schema_async(self, prompt: str) -> dict[str, Any] | None
```

资料来源: [src/oxylabs_ai_studio/apps/browser_agent.py:145-165](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)

## Execution Flow

### Synchronous Execution Workflow

```mermaid
sequenceDiagram
    participant Client as BrowserAgent Client
    participant API as AI Studio API
    participant Service as Browser Agent Service
    
    Client->>API: POST /browser-agent/run
    Note over API: Returns run_id (status: 201)
    Client->>API: GET /browser-agent/run/data?run_id=xxx
    API-->>Client: status: processing
    loop Poll until complete
        Client->>API: GET /browser-agent/run/data?run_id=xxx
        API-->>Client: status: processing
    end
    API-->>Client: status: completed, data returned
```

### Job Status States

The Browser Agent job follows a state machine pattern with the following statuses:

| Status | Description | Action |
|--------|-------------|--------|
| `processing` | Job is currently executing | Continue polling |
| `completed` | Job finished successfully | Return result |
| `failed` | Job encountered an error | Return error message |
| HTTP 202 | Job still initializing | Continue polling |
| HTTP 200 with no data | Unknown state | Continue polling |

资料来源: [src/oxylabs_ai_studio/apps/browser_agent.py:40-80](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)

### Polling Mechanism

The synchronous `run()` method implements a polling mechanism with the following characteristics:

- **Poll Interval**: Configured via `POLL_INTERVAL_SECONDS` constant
- **Timeout Handling**: Raises `TimeoutError` if job does not complete within expected timeframe
- **Interrupt Support**: Catches `KeyboardInterrupt` to gracefully cancel operations

```python
# Polling loop structure (simplified)
while True:
    get_response = self.call_api(...)
    resp_body = get_response.json()
    
    if resp_body["status"] == "completed":
        return BrowserAgentJob(run_id=run_id, data=resp_body["data"])
    if resp_body["status"] == "failed":
        return BrowserAgentJob(run_id=run_id, message=resp_body.get("error_code"))
    
    time.sleep(POLL_INTERVAL_SECONDS)
```

## Use Cases

### E-commerce Product Discovery

The Browser Agent excels at navigating websites that require user interaction:

```python
schema = browser_agent.generate_schema(
    prompt="game name, platform, review stars and price"
)

prompt = "Find if there is game 'super mario odyssey' in the store. If there is, find the price. Use search bar to find the game."
result = browser_agent.run(
    url="https://sandbox.oxylabs.io/",
    user_prompt=prompt,
    output_format="json",
    schema=schema,
    geo_location="Spain",
)
```

### Recommended Workflow for Complex Extraction

For multi-step extraction tasks, combine Browser Agent with other AI Studio tools:

1. **Browser Agent**: Navigate to the target page and identify relevant URLs
2. **AiScraper**: Extract structured data from identified pages
3. **Schema Generation**: Create appropriate schemas for each extraction phase

资料来源: [examples/browser_agent.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/browser_agent.py)

## Error Handling

### Exception Types

| Exception | Cause | Handling |
|-----------|-------|----------|
| `TimeoutError` | Job exceeded timeout threshold | Retry with exponential backoff |
| `KeyboardInterrupt` | User cancelled operation | Clean up and exit gracefully |
| `Exception` | API request failed | Check API key, network connectivity |

### Error Response Handling

```python
if resp_body["status"] == "failed":
    return BrowserAgentJob(
        run_id=run_id,
        message=resp_body.get("error_code", None),
        data=None,
    )
```

### Schema Generation Errors

```python
if response.status_code != 200:
    raise Exception(f"Failed to generate schema: {response.text}")
```

## Configuration Options

### Proxy Location

Specify geographic location for requests:

```python
result = browser_agent.run(
    url="https://example.com",
    user_prompt="Extract product information",
    geo_location="Germany",  # or "DE" for ISO2 format
)
```

Supported formats:
- ISO 2-letter country codes (e.g., "DE", "US")
- Country canonical names (e.g., "Germany", "United States")

### JavaScript Rendering

Control JavaScript rendering behavior:

| Value | Behavior |
|-------|----------|
| `False` | No JavaScript rendering (fastest) |
| `True` | Always render JavaScript |
| `"auto"` | Service automatically detects if rendering is needed |

### User-Agent Customization

```python
result = browser_agent.run(
    url="https://example.com",
    user_prompt="Navigate and extract",
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
)
```

## Best Practices

1. **Schema Definition**: Always provide a well-defined schema when using `output_format="json"` for predictable results

2. **Async for Multiple Tasks**: Use `run_async()` when running multiple browser agents concurrently to maximize throughput

3. **Interrupt Handling**: Wrap long-running operations in try-except blocks to handle user cancellations

4. **Error Retries**: Implement retry logic with exponential backoff for transient failures:
   ```python
   for attempt in range(3):
       try:
           result = browser_agent.run(url=url, user_prompt=prompt)
           break
       except TimeoutError:
           time.sleep(2 ** attempt)
   ```

5. **Geo-Location**: Use appropriate `geo_location` values when targeting region-specific content

## Comparison with Other Apps

| Feature | Browser Agent | AI Scraper | AI Crawler |
|---------|---------------|------------|------------|
| Navigation Actions | ✅ | ❌ | ❌ |
| JavaScript Interaction | ✅ | Configurable | Configurable |
| Pagination Handling | ✅ (manual) | Manual | Automatic |
| Single Page Focus | ✅ | ✅ | ❌ |
| Schema Generation | ✅ | ✅ | ✅ |
| Output Formats | json, markdown | json, markdown, csv, screenshot | json, markdown, csv, toon |

## API Endpoints Reference

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/browser-agent/run` | POST | Initiate a browser agent job |
| `/browser-agent/run/data` | GET | Poll job status and retrieve results |
| `/browser-agent/generate-params` | POST | Generate extraction schema from prompt |

## See Also

- [AI Scraper Feature](./ai_scraper.md) - Single-page content extraction
- [AI Crawler Feature](./ai_crawler.md) - Multi-page website crawling
- [AI Search Feature](./ai_search.md) - Search engine result extraction
- [AI Map Feature](./ai_map.md) - Site mapping and discovery

---

<a id='client-architecture'></a>

## Client Architecture

### 相关页面

相关主题：[Data Models](#data-models), [Error Handling and Logging](#error-handling-logging)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [src/oxylabs_ai_studio/client.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/client.py)
- [src/oxylabs_ai_studio/utils.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/utils.py)
- [src/oxylabs_ai_studio/__init__.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/__init__.py)
- [src/oxylabs_ai_studio/apps/ai_scraper.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_scraper.py)
- [src/oxylabs_ai_studio/apps/ai_search.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_search.py)
- [src/oxylabs_ai_studio/apps/ai_crawler.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_crawler.py)
</details>

# Client Architecture

## Overview

The Oxylabs AI Studio Python SDK follows a layered client architecture that provides a unified interface for interacting with various AI-powered web scraping and data extraction services. The architecture separates concerns between HTTP communication, API interaction, and application-specific logic, enabling modularity and maintainability.

The client layer serves as the foundation for all application modules (`AiScraper`, `AiSearch`, `AiCrawler`, `BrowserAgent`, `AiMap`) by providing shared functionality for API communication, authentication, request building, and response handling.

## Architecture Components

### Component Overview

| Component | File | Purpose |
|-----------|------|---------|
| `APIClient` | `client.py` | Core HTTP client for all API communications |
| `BaseApp` | `client.py` | Abstract base class for application modules |
| Utility Functions | `utils.py` | Logging, retry logic, and helper utilities |
| Application Modules | `apps/*.py` | Domain-specific API wrappers |

### Class Hierarchy

```mermaid
graph TD
    A[APIClient] --> B[BaseApp]
    B --> C[AiScraper]
    B --> D[AiSearch]
    B --> E[AiCrawler]
    B --> F[BrowserAgent]
    B --> G[AiMap]
    
    H[Requests Session] --> A
    I[Configuration] --> A
```

## API Client (`APIClient`)

### Purpose and Responsibilities

The `APIClient` class is the core HTTP communication layer that handles:

- **Authentication**: Attaches API credentials to all requests
- **Connection Management**: Manages HTTP session lifecycle
- **Base URL Configuration**: Stores the API endpoint configuration
- **Request Execution**: Performs HTTP calls to the Oxylabs API

### Configuration Parameters

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `api_key` | `str` | Authentication key for Oxylabs API | Required |
| `base_url` | `str` | API base endpoint | `https://ai.oxylabs.io/api/v1` |
| `timeout` | `int` | Request timeout in seconds | Configurable |
| `max_retries` | `int` | Maximum retry attempts for failed requests | Configurable |

### Key Methods

```mermaid
graph TD
    A[make_request] --> B{Method Type}
    B -->|POST| C[POST Request]
    B -->|GET| D[GET Request]
    C --> E[Attach JSON Body]
    D --> F[Attach Query Params]
    E --> G[Execute Request]
    F --> G
    G --> H{Response Status}
    H -->|2xx| I[Return Response]
    H -->|4xx/5xx| J[Raise Exception]
```

The `APIClient` exposes methods that all application modules use for API communication:

- `call_api(client, url, method, body, params)` - Generic API call method
- `get_client()` - Returns configured HTTP client instance
- Session management methods for connection pooling

## Base Application Class (`BaseApp`)

### Purpose and Responsibilities

The `BaseApp` class serves as an abstract base for all application-specific modules. It provides:

- **Common API Interface**: Unified `call_api()` method across all apps
- **Client Initialization**: Automatic HTTP client setup with authentication
- **Polling Infrastructure**: Shared job status polling mechanism
- **Error Handling**: Standardized exception handling patterns

### Polling Mechanism

All async operations use a polling pattern to check job completion:

```mermaid
graph TD
    A[Submit Job Request] --> B[Get run_id]
    B --> C[Poll Status Endpoint]
    C --> D{Status Check}
    D -->|202 Processing| C
    D -->|200 Completed| E[Return Result]
    D -->|Error| F[Return Error]
    G[Max Timeout] -->|Exceeded| H[Raise TimeoutError]
```

### Polling Configuration

| Parameter | Value | Description |
|-----------|-------|-------------|
| `POLL_INTERVAL_SECONDS` | `2` | Seconds between status checks |
| `MAX_TIMEOUT_SECONDS` | `300` | Maximum wait time before timeout |

## Application Modules

### AiScraper

Located in `src/oxylabs_ai_studio/apps/ai_scraper.py`, this module provides structured web scraping capabilities.

**Key Features:**
- Single URL content extraction
- Structured JSON output with custom schemas
- Markdown, HTML, CSV, and screenshot output formats
- JavaScript rendering support
- Geo-location proxy rotation

**Core Methods:**
- `scrape()` - Synchronous scraping operation
- `scrape_async()` - Asynchronous scraping operation
- `generate_schema()` - AI-powered schema generation from natural language

### AiSearch

Located in `src/oxylabs_ai_studio/apps/ai_search.py`, this module handles search engine result page (SERP) scraping.

**Key Features:**
- Full search with content retrieval
- Instant search for quick results (up to 10 results)
- Content extraction in markdown format
- Geo-location targeting for localized results

**Core Methods:**
- `search()` - Full search with content
- `instant_search()` - Fast search without content polling

### AiCrawler

Located in `src/oxylabs_ai_studio/apps/ai_crawler.py`, this module provides recursive web crawling with AI-guided extraction.

**Key Features:**
- Multi-page crawling with depth control
- AI-guided data extraction
- Structured JSON output with generated schemas
- Source limitation and filtering

**Core Methods:**
- `crawl()` - Start crawling operation
- `generate_schema()` - Generate extraction schema from prompt

## Request/Response Flow

### Standard API Call Flow

```mermaid
sequenceDiagram
    participant App as Application Module
    participant Client as API Client
    participant API as Oxylabs API
    
    App->>Client: call_api(url, method, body)
    Client->>Client: Prepare request headers
    Client->>Client: Attach auth (API Key)
    Client->>API: HTTP Request
    API-->>Client: Response
    Client-->>App: Processed Response
```

### Job-Based Operation Flow

For long-running operations (scraping, crawling, searching):

```mermaid
graph TD
    A[Submit Job] --> B[Get run_id]
    B --> C[Loop: Poll Status]
    C --> D{Response Status}
    D -->|202| C
    D -->|200 Completed| E[Return Data]
    D -->|Failed| F[Return Error Info]
```

## Data Models

### Common Response Models

| Model | Fields | Description |
|-------|--------|-------------|
| `AiScraperJob` | `run_id`, `message`, `data` | Scraping job result |
| `AiSearchJob` | `run_id`, `message`, `data` | Search job result |
| `AiCrawlerJob` | `run_id`, `message`, `data` | Crawling job result |
| `DataModel` | `type`, `content` | Extracted data container |

### Output Format Types

| Format | Type | Description |
|--------|------|-------------|
| `json` | `dict` | Structured JSON output |
| `markdown` | `str` | Markdown formatted text |
| `html` | `str` | Raw HTML content |
| `screenshot` | `str` | Base64 encoded image |
| `csv` | `str` | CSV formatted data |
| `toon` | `dict` | Tabular object notation |

## Authentication

The SDK uses API key-based authentication passed during initialization:

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")
```

The API key is:
- Stored in the `BaseApp` instance configuration
- Automatically attached to all outgoing HTTP requests
- Used to authenticate against the Oxylabs AI Studio API endpoint

## Configuration and Utils

### Logging Configuration

Located in `src/oxylabs_ai_studio/utils.py`, the SDK provides structured logging for debugging and monitoring:

- Configurable log levels
- Request/response logging
- Error tracing with context

### Error Handling

The architecture implements layered error handling:

| Layer | Error Type | Handling |
|-------|-----------|----------|
| Client | Network errors | Retry with backoff |
| API | HTTP errors | Exception with response details |
| Application | Business logic | Domain-specific exceptions |

### Timeout Configuration

| Parameter | Default | Description |
|-----------|---------|-------------|
| Request timeout | Varies by endpoint | Per-request timeout |
| Polling timeout | 300 seconds | Maximum wait for job completion |
| Poll interval | 2 seconds | Time between status checks |

## Usage Patterns

### Synchronous Usage

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")
result = scraper.scrape(
    url="https://example.com",
    output_format="json",
    schema={"type": "object", ...}
)
```

### Asynchronous Usage

```python
import asyncio
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")

async def main():
    result = await scraper.scrape_async(
        url="https://example.com",
        output_format="markdown"
    )
    print(result.data)

asyncio.run(main())
```

## Summary

The Client Architecture of oxylabs-ai-studio-py provides:

1. **Separation of Concerns**: HTTP communication isolated in `APIClient`
2. **Code Reuse**: Common functionality in `BaseApp` for all modules
3. **Extensibility**: Easy addition of new application modules
4. **Reliability**: Built-in polling, retry, and timeout mechanisms
5. **Flexibility**: Support for both sync and async operations

All application modules inherit from the shared base architecture, ensuring consistent behavior and API patterns across the SDK.

---

<a id='data-models'></a>

## Data Models

### 相关页面

相关主题：[Client Architecture](#client-architecture), [AI-Scraper Feature](#ai-scraper)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/oxylabs_ai_studio/models.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/models.py)
- [src/oxylabs_ai_studio/apps/ai_scraper.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_scraper.py)
- [src/oxylabs_ai_studio/apps/ai_search.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_search.py)
- [src/oxylabs_ai_studio/apps/ai_crawler.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_crawler.py)
- [src/oxylabs_ai_studio/apps/browser_agent.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)
</details>

# Data Models

The oxylabs-ai-studio-py SDK provides a set of Pydantic-based data models that standardize how API responses are structured across all applications. These models serve as the foundation for type-safe data handling, ensuring consistent response parsing regardless of which AI-powered service is being used.

## Overview

The SDK implements a layered response model architecture:

| Layer | Model | Purpose |
|-------|-------|---------|
| Container | `DataModel` | Wraps the actual extracted content with type metadata |
| Response | `AiScraperJob`, `BrowserAgentJob`, `AiSearchJob`, `AiCrawlerJob` | Top-level job responses containing status, run ID, and data |

This design separates concerns between job metadata (run tracking, error handling) and the actual data payload, allowing flexible content types while maintaining a consistent interface.

## Core Response Models

All job response models inherit from Pydantic's `BaseModel` and share a common structure with three fields.

### Common Fields Across All Job Models

| Field | Type | Description |
|-------|------|-------------|
| `run_id` | `str` | Unique identifier for the API job execution |
| `message` | `str \| None` | Error code or status message (nullable) |
| `data` | Varies | The actual response payload (type depends on output format and model) |

### AiScraperJob

Located in `ai_scraper.py`, this model handles single-page scraping responses.

```python
class AiScraperJob(BaseModel):
    run_id: str
    message: str | None = None
    data: str | dict | None
```

**Data Type Mapping:**

| Output Format | Data Type |
|---------------|-----------|
| `json` | `dict` |
| `markdown` | `str` |
| `csv` | `str` (CSV formatted) |
| `screenshot` | `str` (base64 encoded) |

资料来源：[readme.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/readme.md)()

### BrowserAgentJob

Located in `browser_agent.py`, this model handles browser automation task responses. It differs from `AiScraperJob` by using a nested `DataModel` structure.

```python
class DataModel(BaseModel):
    type: Literal["json", "markdown", "html", "screenshot", "csv"]
    content: dict[str, Any] | str | None

class BrowserAgentJob(BaseModel):
    run_id: str
    message: str | None = None
    data: DataModel | None = None
```

**Supported Content Types:**

- `json` - Structured JSON data (dict)
- `markdown` - Markdown formatted text (str)
- `html` - Raw HTML content (str)
- `screenshot` - Base64 encoded image (str)
- `csv` - CSV formatted data (str)

资料来源：[agentic_code_guide.md](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/agentic_code_guide.md)()

### AiSearchJob

Located in `ai_search.py`, this model handles search engine results pages (SERP) responses.

```python
class AiSearchJob(BaseModel):
    run_id: str
    message: str | None = None
    data: Any  # Search results list
```

The data field contains a list of search results, where each result may include:
- URL
- Title
- Snippet
- Additional metadata depending on `return_content` parameter

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_search.py)()

### AiCrawlerJob

Located in `ai_crawler.py`, this model handles web crawling responses.

```python
class AiCrawlerJob(BaseModel):
    run_id: str
    message: str | None = None
    data: list[str] | dict | None  # Multiple crawled pages
```

The data field contains a list of extracted content from crawled pages, formatted according to the specified `output_format`.

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_crawler.py)()

## DataModel Container

The `DataModel` class provides a unified container for content extraction, wrapping both the data type and actual content together.

```mermaid
classDiagram
    class DataModel {
        +Literal type
        +content: dict~str, Any~ | str | None
    }
    
    class BrowserAgentJob {
        +str run_id
        +str | None message
        +DataModel | None data
    }
    
    BrowserAgentJob o-- DataModel : contains
```

## Schema Integration

The SDK supports both raw JSON schemas and Pydantic model integration for structured data extraction.

### JSON Schema Usage

Pass a dictionary following JSON Schema specification:

```python
schema = {
    "type": "object",
    "properties": {
        "price": {"type": "string"},
        "title": {"type": "string"}
    },
    "required": []
}

result = scraper.scrape(
    url="https://example.com",
    output_format="json",
    schema=schema
)
```

### Pydantic Model Usage

For type-safe extraction, use Pydantic models directly:

```python
from pydantic import BaseModel, Field

class Game(BaseModel):
    title: str
    genre: list[str]
    developer: str
    platform: str
    price: str
    description: str

scraper = AiScraper(api_key="<API_KEY>")
result = scraper.scrape(
    url="https://sandbox.oxylabs.io/products/1",
    output_format="json",
    schema=Game.model_json_schema(),
)
```

资料来源：[examples/scrape_pydantic_schema.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/scrape_pydantic_schema.py)()

## Response Workflow

```mermaid
graph TD
    A[API Request] --> B{Output Format}
    B -->|json| C[Structured Dict]
    B -->|markdown| D[Text String]
    B -->|csv| E[CSV String]
    B -->|screenshot| F[Base64 String]
    
    C --> G[Response Model]
    D --> G
    E --> G
    F --> G
    
    G --> H[Job Response: run_id, message, data]
```

## Example: Accessing Response Data

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="<API_KEY>")

# JSON output
result = scraper.scrape(
    url="https://example.com",
    output_format="json",
    schema={"type": "object", "properties": {"title": {"type": "string"}}}
)

# Access the data
print(result.run_id)      # Job identifier
print(result.message)     # Error code if any
print(result.data)        # Extracted dict content

# Markdown output
result = scraper.scrape(
    url="https://example.com",
    output_format="markdown"
)
print(result.data)        # String content
```

资料来源：[examples/scrape_generated_schema.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/examples/scrape_generated_schema.py)()

## Error Handling

All job models support nullable `message` fields for error propagation:

```python
result = scraper.scrape(url="https://example.com", ...)

if result.message:
    print(f"Error occurred: {result.message}")
else:
    print(f"Success: {result.data}")
```

## Output Format Summary

| Format | Data Structure | Schema Required |
|--------|---------------|-----------------|
| `json` | `dict` | Yes |
| `markdown` | `str` | No |
| `html` | `str` | No |
| `csv` | `str` | Yes |
| `screenshot` | `str` (base64) | No |
| `toon` | Varies | Yes (Browser Agent only) |

---

<a id='configuration-settings'></a>

## Configuration and Settings

### 相关页面

相关主题：[Error Handling and Logging](#error-handling-logging), [Client Architecture](#client-architecture)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/oxylabs_ai_studio/settings.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/settings.py)
- [src/oxylabs_ai_studio/__init__.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/__init__.py)
- [src/oxylabs_ai_studio/apps/ai_search.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_search.py)
- [src/oxylabs_ai_studio/apps/ai_crawler.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_crawler.py)
- [src/oxylabs_ai_studio/apps/ai_scraper.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_scraper.py)
</details>

# Configuration and Settings

## Overview

The oxylabs-ai-studio-py SDK provides a centralized configuration system built on Pydantic's `BaseSettings` class. This approach ensures type safety, environment variable validation, and sensible defaults for all configuration values. The configuration module serves as the single source of truth for API credentials and endpoint URLs that are shared across all application modules.

All SDK applications—including `AiScraper`, `AiCrawler`, `AiSearch`, and `BrowserAgent—consume the same configuration settings, making the system consistent and maintainable. 资料来源：[src/oxylabs_ai_studio/settings.py:1-9]()

## Core Configuration Model

The `Settings` class defines the available configuration parameters with their types, defaults, and validation rules.

```python
class Settings(BaseSettings):
    OXYLABS_AI_STUDIO_API_KEY: str | None = None
    OXYLABS_AI_STUDIO_API_URL: str = "https://api-aistudio.oxylabs.io"
```

### Configuration Parameters

| Parameter | Type | Default Value | Description |
|-----------|------|---------------|-------------|
| `OXYLABS_AI_STUDIO_API_KEY` | `str \| None` | `None` | API authentication key obtained from Oxylabs dashboard |
| `OXYLABS_AI_STUDIO_API_URL` | `str` | `"https://api-aistudio.oxylabs.io"` | Base URL for all API requests |

资料来源：[src/oxylabs_ai_studio/settings.py:1-9]()

## Environment Variable Loading

The SDK automatically loads environment variables using Python's `python-dotenv` package. The `load_dotenv()` function is called at module import time, ensuring all environment variables are available before any configuration is accessed. 资料来源：[src/oxylabs_ai_studio/settings.py:3]()

```mermaid
graph TD
    A[Import oxylabs_ai_studio] --> B[load_dotenv executes]
    B --> C[Environment Variables Loaded]
    C --> D[Settings() instantiated]
    D --> E[API_KEY available to all Apps]
```

## Application Initialization Pattern

All SDK applications accept an optional `api_key` parameter in their constructors. When provided, the key is used directly. When omitted, the applications retrieve the API key from the global `settings` object.

```python
# Direct API key usage
scraper = AiScraper(api_key="<API_KEY>")

# Environment-based API key usage
scraper = AiScraper()  # Reads from OXYLABS_AI_STUDIO_API_KEY
```

This dual approach provides flexibility for different deployment scenarios:

1. **Explicit Parameter**: API key passed directly to constructor
2. **Environment Variable**: API key loaded from `OXYLABS_AI_STUDIO_API_KEY` environment variable

资料来源：[src/oxylabs_ai_studio/apps/ai_scraper.py]() [src/oxylabs_ai_studio/apps/ai_crawler.py]()

## Configuration Access in Applications

### AiSearch Application

The `AiSearch` class initializes its HTTP client with the provided API key and uses the configured API URL for all requests.

```python
def get_client(self) -> httpx.Client:
    return httpx.Client(
        headers={
            "Authorization": f"Bearer {self.api_key or settings.OXYLABS_AI_STUDIO_API_KEY}",
            "Content-Type": "application/json",
        },
        base_url=settings.OXYLABS_AI_STUDIO_API_URL,
        timeout=httpx.Timeout(60.0, connect=10.0),
    )
```

| Endpoint | HTTP Method | Purpose |
|----------|-------------|---------|
| `/search` | POST | Full search with content rendering |
| `/search/instant` | POST | Fast search returning up to 10 results |

资料来源：[src/oxylabs_ai_studio/apps/ai_search.py]()

### AiCrawler Application

The crawler uses the same client configuration pattern, with the API key and base URL sourced from settings:

```python
def get_client(self) -> httpx.Client:
    return httpx.Client(
        headers={
            "Authorization": f"Bearer {self.api_key or settings.OXYLABS_AI_STUDIO_API_KEY}",
            "Content-Type": "application/json",
        },
        base_url=settings.OXYLABS_AI_STUDIO_API_URL,
        timeout=httpx.Timeout(60.0, connect=10.0),
    )
```

| Endpoint | HTTP Method | Purpose |
|----------|-------------|---------|
| `/crawl/run` | POST | Initiate a crawl job |
| `/crawl/run/data` | GET | Retrieve crawl results |
| `/crawl/generate-params` | POST | Generate JSON schema from prompt |

资料来源：[src/oxylabs_ai_studio/apps/ai_crawler.py]()

### AiScraper Application

The scraper follows the identical pattern for HTTP client initialization:

```python
def get_client(self) -> httpx.Client:
    return httpx.Client(
        headers={
            "Authorization": f"Bearer {self.api_key or settings.OXYLABS_AI_studio.API_KEY}",
            "Content-Type": "application/json",
        },
        base_url=settings.OXYLABS_AI_STUDIO_API_URL,
        timeout=httpx.Timeout(60.0, connect=10.0),
    )
```

| Endpoint | HTTP Method | Purpose |
|----------|-------------|---------|
| `/scrape` | POST | Initiate a scrape job |
| `/scrape/schema` | POST | Generate JSON schema from prompt |

资料来源：[src/oxylabs_ai_studio/apps/ai_scraper.py]()

## HTTP Client Configuration

All applications share identical HTTP client configuration through a standardized `get_client()` method:

| Parameter | Value | Description |
|-----------|-------|-------------|
| `Authorization` | `Bearer {API_KEY}` | OAuth 2.0 bearer token authentication |
| `Content-Type` | `application/json` | Request payload format |
| `timeout` | 60.0s (read), 10.0s (connect) | Request timeout configuration |
| `base_url` | `settings.OXYLABS_AI_STUDIO_API_URL` | API base endpoint |

## Setting Up Environment Variables

### Recommended `.env` File

Create a `.env` file in your project root with the following content:

```bash
OXYLABS_AI_STUDIO_API_KEY=your_api_key_here
```

### Installation and Usage Flow

```mermaid
graph LR
    A[Install SDK<br>pip install oxylabs-ai-studio] --> B[Create .env file]
    B --> C[Set OXYLABS_AI_STUDIO_API_KEY]
    C --> D[Import applications]
    D --> E[Initialize with or without api_key]
    E --> F[Make API requests]
```

## Configuration Best Practices

### Development Environment

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

# Option 1: Use .env file
scraper = AiScraper()  # Automatically reads from environment

# Option 2: Explicit API key
scraper = AiScraper(api_key="your_dev_key")
```

### Production Environment

In production deployments, use environment variables directly:

```bash
export OXYLABS_AI_STUDIO_API_KEY="your_production_key"
```

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper()  # Uses production API key from environment
```

### Security Considerations

1. **Never commit API keys** to version control
2. **Use environment variables** for production deployments
3. **Use `.gitignore`** to exclude `.env` files
4. **Rotate API keys** periodically through the Oxylabs dashboard

## Module Exports

The SDK exports the `settings` object for direct access when needed:

```python
from oxylabs_ai_studio import settings

print(settings.OXYLABS_AI_STUDIO_API_URL)
```

资料来源：[src/oxylabs_ai_studio/__init__.py]()

---

<a id='error-handling-logging'></a>

## Error Handling and Logging

### 相关页面

相关主题：[Client Architecture](#client-architecture), [Configuration and Settings](#configuration-settings)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/oxylabs_ai_studio/logger.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/logger.py)
- [src/oxylabs_ai_studio/apps/ai_scraper.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_scraper.py)
- [src/oxylabs_ai_studio/apps/ai_search.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_search.py)
- [src/oxylabs_ai_studio/apps/ai_crawler.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_crawler.py)
- [src/oxylabs_ai_studio/apps/browser_agent.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)
- [src/oxylabs_ai_studio/apps/ai_map.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_map.py)
</details>

# Error Handling and Logging

## Overview

The oxylabs-ai-studio-py SDK implements a comprehensive error handling and logging system that enables developers to monitor SDK operations, debug issues, and gracefully handle failures. The system is designed with simplicity in mind while providing sufficient observability for production environments.

The logging infrastructure uses Python's standard `logging` module with a package-scoped namespace, ensuring all SDK components can be monitored uniformly. Error handling follows a polling-based pattern for asynchronous operations, with explicit timeout management and user cancellation support.

## Logging Architecture

### Logger Configuration

The SDK defines a centralized logging configuration through the `logger.py` module.

```python
LOGGER_NAME = "oxylabs_ai_studio"
DEFAULT_LOG_LEVEL = logging.INFO
```

资料来源：[src/oxylabs_ai_studio/logger.py:1-14](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/logger.py)

### Logger Initialization

The SDK automatically configures logging upon module import using a module-level initialization pattern:

```python
_default_logger = logging.getLogger(LOGGER_NAME)
if not _default_logger.handlers:
    configure_logging()
```

资料来源：[src/oxylabs_ai_studio/logger.py:49-52](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/logger.py)

### Default Log Format

| Component | Value |
|-----------|-------|
| Timestamp | `%(asctime)s` |
| Logger Name | `%(name)s` |
| Log Level | `%(levelname)s` |
| Message | `%(message)s` |

The default format string produces output like: `2024-01-15 10:30:45,123 - oxylabs_ai_studio - INFO - Starting scrape operation`

## Core Logging Functions

### get_logger()

Returns a logger instance for the SDK. Child loggers automatically inherit the parent's configuration.

```python
def get_logger(name: str | None = None) -> logging.Logger:
    if name is None:
        logger_name = LOGGER_NAME
    elif not name.startswith(LOGGER_NAME):
        logger_name = f"{LOGGER_NAME}.{name}"
    else:
        logger_name = name

    logger = logging.getLogger(logger_name)
    if logger_name != LOGGER_NAME:
        logger.handlers.clear()
        logger.propagate = True
    return logger
```

资料来源：[src/oxylabs_ai_studio/logger.py:16-32](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/logger.py)

### configure_logging()

Configures the root SDK logger with custom settings.

```python
def configure_logging(
    level: int = DEFAULT_LOG_LEVEL,
    format_string: str | None = None,
    handler: logging.Handler | None = None,
) -> None:
    logger = logging.getLogger(LOGGER_NAME)
    for existing_handler in logger.handlers[:]:
        logger.removeHandler(existing_handler)

    logger.setLevel(level)
    if handler is None:
        handler = logging.StreamHandler(sys.stderr)

    if format_string is None:
        format_string = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

    formatter = logging.Formatter(format_string)
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.propagate = False
```

资料来源：[src/oxylabs_ai_studio/logger.py:35-48](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/logger.py)

### Configuration Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `level` | `int` | `logging.INFO` | Minimum log level to record |
| `format_string` | `str \| None` | `"%(asctime)s - %(name)s - %(levelname)s - %(message)s"` | Custom format pattern |
| `handler` | `logging.Handler \| None` | `StreamHandler(sys.stderr)` | Output handler |

## Error Handling Patterns

### Polling-Based Job Status Handling

All async operations in the SDK follow a consistent polling pattern to check job completion status.

```mermaid
graph TD
    A[Start Job Request] --> B[Submit to API]
    B --> C[Get run_id]
    C --> D{Polling Loop}
    D -->|HTTP 202| E[Wait POLL_INTERVAL_SECONDS]
    E --> D
    D -->|HTTP 200| F{Check Status}
    F -->|completed| G[Return Success Data]
    F -->|failed| H[Return Failure with Error Code]
    F -->|processing| E
    D -->|timeout| I[Raise TimeoutError]
    D -->|KeyboardInterrupt| J[Log Cancellation & Raise]
```

### Timeout Management

Each application module defines its own timeout threshold and polling configuration.

| Application | Timeout (seconds) | Poll Interval (seconds) | Max Attempts |
|-------------|-------------------|------------------------|--------------|
| Browser Agent | 600 (10 min) | 5 | 120 |
| AI Crawler | 600 (10 min) | 5 | 120 |
| AI Scraper | 600 (10 min) | 5 | 120 |
| AI Search | 600 (10 min) | 5 | 120 |
| AI Map | 600 (10 min) | 5 | 120 |

资料来源：
- [src/oxylabs_ai_studio/apps/browser_agent.py:1-15](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)
- [src/oxylabs_ai_studio/apps/ai_crawler.py:1-25](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_crawler.py)
- [src/oxylabs_ai_studio/apps/ai_scraper.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_scraper.py)
- [src/oxylabs_ai_studio/apps/ai_search.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_search.py)
- [src/oxylabs_ai_studio/apps/ai_map.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_map.py)

### Status Response Handling

The API returns standardized status responses that the SDK interprets:

```python
if resp_body["status"] == "completed":
    return JobResult(run_id=run_id, data=resp_body["data"])
if resp_body["status"] == "failed":
    return JobResult(run_id=run_id, message=resp_body.get("error_code"), data=None)
```

### API Error Responses

| HTTP Status | Meaning | SDK Action |
|-------------|---------|------------|
| 200 | Success | Process response body |
| 202 | Accepted, still processing | Continue polling |
| 4xx | Client error | Raise `Exception` with response text |
| 5xx | Server error | Raise `Exception` with response text |

## Job Result Models

### Common Response Structure

All job results follow a consistent Pydantic model structure:

```python
class AiScraperJob(BaseModel):
    run_id: str
    message: str | None = None
    data: str | dict | None

class BrowserAgentJob(BaseModel):
    run_id: str
    message: str | None = None
    data: DataModel | None = None

class AiSearchJob(BaseModel):
    run_id: str
    message: str | None = None
    data: resp_body["data"]

class AiCrawlerJob(BaseModel):
    run_id: str
    message: str | None = None
    data: list[dict[str, Any]] | list[str] | None = None
```

资料来源：
- [src/oxylabs_ai_studio/apps/ai_scraper.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_scraper.py)
- [src/oxylabs_ai_studio/apps/browser_agent.py:28-35](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)
- [src/oxylabs_ai_studio/apps/ai_search.py](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_search.py)
- [src/oxylabs_ai_studio/apps/ai_crawler.py:20-24](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/ai_crawler.py)

### DataModel for Browser Agent

```python
class DataModel(BaseModel):
    type: Literal["json", "markdown", "html", "screenshot", "csv", "toon"]
    content: dict[str, Any] | str | None
```

## Exception Handling

### ValueError Exceptions

The SDK validates input parameters and raises `ValueError` for missing required fields:

```python
if output_format in ["json", "csv", "toon"] and schema is None:
    raise ValueError(
        "openapi_schema is required when output_format is json, csv or toon.",
    )
```

资料来源：[src/oxylabs_ai_studio/apps/browser_agent.py:50-54](https://github.com/oxylabs/oxylabs-ai-studio-py/blob/main/src/oxylabs_ai_studio/apps/browser_agent.py)

### Schema Generation Errors

```python
if response.status_code != 200:
    raise Exception(f"Failed to generate schema: {response.text}")
```

### Timeout Errors

```python
raise TimeoutError(f"Failed to scrape {url}: timeout.")
raise TimeoutError(f"Failed to search {query=}")
raise TimeoutError(f"Failed to crawl {url}: timeout.")
raise TimeoutError(f"Failed to map {url}: timeout.")
```

### API Call Errors

```python
if status_code != 200:
    raise Exception(f"Failed to perform instant search: `{response.text}`")
```

### User Cancellation

The SDK gracefully handles `KeyboardInterrupt` exceptions:

```python
except KeyboardInterrupt:
    logger.info("[Cancelled] Scraping was cancelled by user.")
    raise KeyboardInterrupt from None
```

| Exception Type | Trigger | User Message |
|----------------|---------|---------------|
| `ValueError` | Missing required parameter | Parameter-specific message |
| `Exception` | API returns non-200 status | API response text |
| `TimeoutError` | Job exceeds timeout threshold | Operation-specific timeout message |
| `KeyboardInterrupt` | User cancels operation | "[Cancelled] {operation} was cancelled by user." |

## Logging Usage Examples

### Basic Logger Usage

```python
from oxylabs_ai_studio.logger import get_logger

logger = get_logger(__name__)  # Creates logger for current module
logger.info("Starting operation")
logger.warning("Potential issue detected")
logger.error("Operation failed")
```

### Custom Logging Configuration

```python
from oxylabs_ai_studio.logger import configure_logging
import logging

# Set DEBUG level with custom format
configure_logging(
    level=logging.DEBUG,
    format_string="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
```

### Module-Specific Logging

```python
from oxylabs_ai_studio.logger import get_logger

# For SDK internal modules
browser_logger = get_logger("browser_agent")
scraper_logger = get_logger("ai_scraper")

# Child loggers propagate to parent
browser_logger.info("Browser agent started")  # Logged as "oxylabs_ai_studio.browser_agent"
```

## Best Practices

### 1. Configure Logging Early

Set up logging configuration before initializing SDK clients:

```python
from oxylabs_ai_studio.logger import configure_logging
import logging

configure_logging(level=logging.DEBUG)

from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="your_key")
```

### 2. Handle Timeouts Appropriately

Wrap SDK calls in try-except blocks:

```python
from oxylabs_ai_studio.apps.ai_scraper import AiScraper

scraper = AiScraper(api_key="your_key")
try:
    result = scraper.scrape(url="https://example.com", output_format="markdown")
except TimeoutError as e:
    logger.error(f"Scraping timed out: {e}")
except Exception as e:
    logger.error(f"Scraping failed: {e}")
```

### 3. Check Job Status for Errors

Always verify the result's `message` field:

```python
result = scraper.scrape(url="https://example.com")
if result.message:
    logger.warning(f"Job completed with message: {result.message}")
if result.data is None:
    logger.error("Job failed - no data returned")
```

### 4. Handle User Cancellation

Gracefully handle keyboard interrupts:

```python
import logging
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler

logger = get_logger(__name__)
crawler = AiCrawler(api_key="your_key")

try:
    result = crawler.crawl(url="https://example.com", user_prompt="Extract data")
except KeyboardInterrupt:
    logger.info("Crawl operation was cancelled by user")
    # Perform cleanup if needed
```

## Summary

The oxylabs-ai-studio-py SDK provides a unified logging and error handling system that:

- Uses Python's standard `logging` module with package-scoped namespaces
- Configures logging automatically on module import
- Supports custom log levels, formats, and handlers
- Implements polling-based async operation handling with configurable timeouts
- Returns consistent Pydantic model responses with status information
- Provides user-friendly error messages and cancellation handling
- Follows a single pattern across all application modules for predictability

---

---

## Doramagic Pitfall Log

Project: oxylabs/oxylabs-ai-studio-py

Summary: Found 8 potential pitfall items; 0 are high/blocking. Highest priority: identity - 仓库名和安装名不一致.

## 1. identity · 仓库名和安装名不一致

- Severity: medium
- Evidence strength: runtime_trace
- Finding: 仓库名 `oxylabs-ai-studio-py` 与安装入口 `oxylabs-ai-studio` 不完全一致。
- User impact: 用户照着仓库名搜索包或照着包名找仓库时容易走错入口。
- Suggested check: 在 npm/PyPI/GitHub 上确认包名映射和官方 README 说明。
- Reproduction command: `pip install oxylabs-ai-studio`
- Guardrail action: 页面必须同时展示 repo 名和真实安装入口，避免用户搜索错包。
- Evidence: identity.distribution | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | repo=oxylabs-ai-studio-py; install=oxylabs-ai-studio

## 2. capability · 能力判断依赖假设

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: 假设不成立时，用户拿不到承诺的能力。
- Suggested check: 将假设转成下游验证清单。
- Guardrail action: 假设必须转成验证项；没有验证结果前不能写成事实。
- Evidence: capability.assumptions | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | README/documentation is current enough for a first validation pass.

## 3. maintenance · 来源证据：v.0.2.19

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：v.0.2.19
- User impact: 可能影响升级、迁移或版本选择。
- Suggested check: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_96a2e22cbea94aac8c2788f763ce7c04 | https://github.com/oxylabs/oxylabs-ai-studio-py/releases/tag/v0.2.19 | 来源类型 github_release 暴露的待验证使用条件。

## 4. maintenance · 维护活跃度未知

- Severity: medium
- Evidence strength: source_linked
- Finding: 未记录 last_activity_observed。
- User impact: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Suggested check: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Guardrail action: 维护活跃度未知时，推荐强度不能标为高信任。
- Evidence: evidence.maintainer_signals | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | last_activity_observed missing

## 5. security_permissions · 下游验证发现风险项

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: 下游已经要求复核，不能在页面中弱化。
- Suggested check: 进入安全/权限治理复核队列。
- Guardrail action: 下游风险存在时必须保持 review/recommendation 降级。
- Evidence: downstream_validation.risk_items | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | no_demo; severity=medium

## 6. security_permissions · 存在评分风险

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: 风险会影响是否适合普通用户安装。
- Suggested check: 把风险写入边界卡，并确认是否需要人工复核。
- Guardrail action: 评分风险必须进入边界卡，不能只作为内部分数。
- Evidence: risks.scoring_risks | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | no_demo; severity=medium

## 7. maintenance · issue/PR 响应质量未知

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: 用户无法判断遇到问题后是否有人维护。
- Suggested check: 抽样最近 issue/PR，判断是否长期无人处理。
- Guardrail action: issue/PR 响应未知时，必须提示维护风险。
- Evidence: evidence.maintainer_signals | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | issue_or_pr_quality=unknown

## 8. maintenance · 发布节奏不明确

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: 安装命令和文档可能落后于代码，用户踩坑概率升高。
- Suggested check: 确认最近 release/tag 和 README 安装命令是否一致。
- Guardrail action: 发布节奏未知或过期时，安装说明必须标注可能漂移。
- Evidence: evidence.maintainer_signals | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | release_recency=unknown

<!-- canonical_name: oxylabs/oxylabs-ai-studio-py; human_manual_source: deepwiki_human_wiki -->
