Doramagic Project Pack · Human Manual
oxylabs-ai-studio-py
This guide covers the installation process for the Oxylabs AI Studio Python SDK (oxylabs-ai-studio). The SDK provides a Python interface for interacting with Oxylabs AI Studio API services...
Installation Guide
Related topics: Quick Start Guide
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Quick Start Guide
Installation Guide
Overview
This guide covers the installation process for the Oxylabs AI Studio Python SDK (oxylabs-ai-studio). The SDK provides a Python interface for interacting with Oxylabs AI Studio API services, including AI-Scraper, AI-Crawler, AI-Browser-Agent, and other data extraction tools.
Sources: readme.md:1-10
System Requirements
| Requirement | Minimum Version | Notes |
|---|---|---|
| Python | 3.10+ | Earlier versions are not supported |
| Package Manager | pip | Standard Python package installer |
| API Key | Required | Must be obtained from Oxylabs AI Studio |
Sources: readme.md:10-11
Prerequisites
Before installing the SDK, ensure your environment meets the following requirements:
Python Version Check
python --version
# or
python3 --version
The output should show Python 3.10 or higher.
pip Availability
pip --version
# or
pip3 --version
Installation Methods
Standard Installation (Recommended)
The official release version can be installed directly from PyPI using pip:
pip install oxylabs-ai-studio
Sources: readme.md:14
Installation from Source
For development or testing purposes, you can install from the source repository:
git clone https://github.com/oxylabs/oxylabs-ai-studio-py.git
cd oxylabs-ai-studio-py
pip install -e .
Post-Installation Verification
After installation, verify the SDK is properly installed:
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler
from oxylabs_ai_studio.apps.ai_search import AiSearch
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent
from oxylabs_ai_studio.apps.ai_map import AiMap
print("Oxylabs AI Studio SDK imported successfully")
If no import errors occur, the installation was successful.
SDK Components
The SDK includes the following main components:
| Component | Module Path | Purpose |
|---|---|---|
| AI Scraper | oxylabs_ai_studio.apps.ai_scraper | Scrape website content with AI |
| AI Crawler | oxylabs_ai_studio.apps.ai_crawler | Crawl and extract data from sites |
| AI Search | oxylabs_ai_studio.apps.ai_search | Perform AI-powered SERP searches |
| Browser Agent | oxylabs_ai_studio.apps.browser_agent | Automate browser-based tasks |
| AI Map | oxylabs_ai_studio.apps.ai_map | Map website structures |
Sources: readme.md:6-7
Quick Start Configuration
After installation, you need to configure your API key to use the SDK:
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
# Initialize with your API key
scraper = AiScraper(api_key="<YOUR_API_KEY>")
# Example usage
result = scraper.scrape(
url="https://example.com",
output_format="markdown"
)
Replace <YOUR_API_KEY> with your actual Oxylabs AI Studio API key.
Environment Setup Recommendations
Virtual Environment (Recommended)
For isolated development, use a virtual environment:
# Create virtual environment
python -m venv ai-studio-env
# Activate on Linux/macOS
source ai-studio-env/bin/activate
# Activate on Windows
ai-studio-env\Scripts\activate
# Install SDK
pip install oxylabs-ai-studio
Using pyproject.toml
If you're managing a project with pyproject.toml:
[project]
name = "oxylabs-ai-studio"
version = "latest"
requires-python = ">=3.10"
dependencies = [
"oxylabs-ai-studio",
]
Dependencies
The SDK relies on the following core dependencies (automatically installed):
httpx- HTTP client for API requestspydantic- Data validation using Python type hints- Standard library modules:
time,asyncio,logging
Sources: pyproject.toml
Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| ImportError | Ensure Python 3.10+ is installed |
| AuthenticationError | Verify API key is correct and active |
| TimeoutError | Check network connectivity |
| pip install fails | Try upgrading pip: pip install --upgrade pip |
Upgrade Instructions
To upgrade to the latest version:
pip install --upgrade oxylabs-ai-studio
Related Documentation
- Oxylabs AI Studio - Official product page
- API Documentation - Detailed API reference
- Discord Community - Get help from the community
Sources: readme.md:1-10
Quick Start Guide
Related topics: Installation Guide, AI-Scraper Feature, AI-Crawler Feature
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Installation Guide, AI-Scraper Feature, AI-Crawler Feature
Quick Start Guide
Overview
The Oxylabs AI Studio Python SDK (oxylabs-ai-studio) provides a simple Python interface for interacting with Oxylabs AI Studio API services. This SDK enables developers to integrate AI-powered web scraping, crawling, search, and browser automation capabilities into their Python applications with minimal configuration.
Key Features:
- AI-Scraper: Extract structured data from web pages using natural language prompts
- AI-Crawler: Automatically discover and crawl related pages starting from a URL
- AI-Search: Perform SERP (Search Engine Results Page) searches with content extraction
- Browser-Agent: Automate browser actions (clicks, scrolls, navigation) via prompts
- AI-Map: Discover website structure and find pages matching specific keywords
Requirements:
| Requirement | Version |
|---|---|
| Python | 3.10+ |
| API Key | Valid Oxylabs AI Studio API key |
Sources: readme.md:1-15
Sources: readme.md:1-15
AI-Scraper Feature
Related topics: AI-Crawler Feature, Data Models
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AI-Crawler Feature, Data Models
AI-Scraper Feature
Overview
The AI-Scraper is a core feature of the Oxylabs AI Studio Python SDK designed to scrape website content and return extracted data in multiple formats. It leverages AI capabilities to intelligently extract structured or unstructured data from web pages based on natural language prompts or JSON schemas.
Purpose and Scope
The AI-Scraper provides the following capabilities:
- Flexible Output Formats: Supports Markdown, JSON, CSV, and screenshot output
- Schema-Based Extraction: Enables structured data extraction using JSON schemas or Pydantic models
- AI-Powered Parsing: Uses natural language prompts to guide data extraction
- JavaScript Rendering: Supports pages requiring client-side rendering
- Geo-Location Targeting: Allows scraping from specific geographic locations
Architecture
graph TD
A[User Request] --> B[AiScraper Class]
B --> C{Output Format}
C -->|markdown| D[Markdown Parser]
C -->|json| E[Schema Validator]
C -->|csv| F[CSV Formatter]
C -->|screenshot| G[Screenshot Capture]
D --> H[API Endpoint]
E --> H
F --> H
G --> H
H --> I[Oxylabs API]
I --> J[Response Handler]
J --> K[Structured Data]Core Components
AiScraper Class
The main interface for web scraping operations. The class provides both synchronous and asynchronous methods for scraping web content.
Import Statement:
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
Initialization:
scraper = AiScraper(api_key="<API_KEY>")
Key Methods
| Method | Description | Type |
|---|---|---|
scrape() | Synchronous scraping operation | Sync |
scrape_async() | Asynchronous scraping operation | Async |
generate_schema() | Auto-generate JSON schema from prompt | Helper |
API Parameters
Required Parameters
| Parameter | Type | Description |
|---|---|---|
url | str | Target URL to scrape |
output_format | Literal["json", "markdown", "csv", "screenshot"] | Desired output format |
Optional Parameters
| Parameter | Type | Default | Description | |
|---|---|---|---|---|
schema | `dict \ | None` | None | JSON schema for structured extraction (required for "json" and "csv" formats) |
render_javascript | bool | False | Enable JavaScript rendering | |
geo_location | str | None | Proxy location in ISO2 format or country name |
Usage Patterns
Basic Markdown Scraping
The simplest use case extracts page content as Markdown without requiring a schema.
Example (examples/scrape_markdown.py):
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="<API_KEY>")
url = "https://sandbox.oxylabs.io/products/1"
result = scraper.scrape(
url=url,
output_format="markdown",
render_javascript=False,
geo_location="Germany",
)
print(result)
Schema-Based JSON Extraction
For structured data extraction, provide a JSON schema defining the expected output structure.
Example (examples/scrape_generated_schema.py):
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="<API_KEY>")
schema = scraper.generate_schema(
prompt="want to parse developer, platform, type, price game title, genre (array) and description"
)
print(f"Generated schema: {schema}")
url = "https://sandbox.oxylabs.io/products/3"
result = scraper.scrape(
url=url,
output_format="json",
schema=schema,
render_javascript=False,
)
print(result)
Pydantic Model Integration
For type-safe extraction, use Pydantic models which are automatically converted to JSON schemas.
Example (examples/scrape_pydantic_schema.py):
from pydantic import BaseModel
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="<API_KEY>")
class Game(BaseModel):
title: str
genre: list[str]
developer: str
platform: str
game_type: str
description: str
price: str
availability: str
url = "https://sandbox.oxylabs.io/products/1"
result = scraper.scrape(
url=url,
output_format="json",
schema=Game.model_json_schema(),
render_javascript=False,
)
print(result)
Async Usage
Async Interface
For high-performance applications, use the async interface:
import asyncio
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="<API_KEY>")
async def main():
url = "https://sandbox.oxylabs.io/products/3"
result = await scraper.scrape_async(
url=url,
output_format="json",
schema={"type": "object", "properties": {"price": {"type": "string"}}, "required": []},
render_javascript=False,
)
print(result)
if __name__ == "__main__":
asyncio.run(main())
Response Data Model
AiScraperJob Structure
| Field | Type | Description | ||
|---|---|---|---|---|
run_id | str | Unique identifier for the scraping job | ||
message | `str \ | None` | Status message or error description | |
data | `dict \ | str \ | None` | Extracted data based on output format |
Data Type by Output Format
| Output Format | Data Type | Description |
|---|---|---|
json | dict | Parsed JSON object |
markdown | str | HTML content converted to Markdown |
csv | str | Comma-separated values string |
screenshot | str | Base64-encoded image data |
Schema Generation
The AI-Scraper provides a generate_schema() helper method that uses AI to create appropriate JSON schemas from natural language prompts.
schema = scraper.generate_schema(
prompt="proxy plans which have name, price, and features"
)
Parameters:
| Parameter | Type | Description |
|---|---|---|
prompt | str | Natural language description of desired data structure |
Returns: dict - A valid JSON schema object
Workflow Diagram
sequenceDiagram
participant User
participant AiScraper
participant API
participant Response
User->>AiScraper: scrape(url, output_format, schema)
AiScraper->>AiScraper: Validate parameters
AiScraper->>API: POST request with payload
API->>API: Process scraping request
API->>Response: Return extracted data
Response->>AiScraper: AiScraperJob response
AiScraper->>User: Return result objectConfiguration Options
JavaScript Rendering
The render_javascript parameter controls browser rendering behavior:
| Value | Behavior |
|---|---|
False | No JavaScript rendering (default) |
True | Always render JavaScript |
"auto" | Service automatically detects if rendering is needed |
Geo-Location
Specify geographic location for proxy-based scraping:
result = scraper.scrape(
url="https://example.com",
geo_location="Germany", # Country name
# or "DE" for ISO2 format
)
Error Handling
When a scraping operation fails, the response will include:
run_id- The job identifier for troubleshootingmessage- Error descriptiondata-Nonewhen an error occurs
Always check the message field before accessing data:
result = scraper.scrape(url=url, output_format="json", schema=schema)
if result.message:
print(f"Error: {result.message}")
else:
print(result.data)
Best Practices
- Use Appropriate Schemas: Always provide a valid JSON schema when using
output_format="json"oroutput_format="csv" - Enable JS Rendering When Needed: Set
render_javascript=Truefor SPAs and dynamic content - Specify Geo-Location: Use
geo_locationparameter when location-specific content is required - Handle Errors Gracefully: Always check the
messagefield in the response
Summary
The AI-Scraper feature provides a powerful, flexible interface for web content extraction within the Oxylabs AI Studio ecosystem. With support for multiple output formats, schema-based extraction, and both synchronous and asynchronous operation modes, it serves as a versatile tool for various web scraping use cases.
Source: https://github.com/oxylabs/oxylabs-ai-studio-py / Human Manual
AI-Crawler Feature
Related topics: AI-Scraper Feature, AI-Map Feature
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AI-Scraper Feature, AI-Map Feature
AI-Crawler Feature
Overview
The AI-Crawler is a web crawling and content extraction module within the Oxylabs AI Studio Python SDK. It enables intelligent, AI-powered website crawling with natural language prompts to guide content extraction. The crawler navigates starting URLs, discovers relevant pages based on user-defined prompts, and returns structured or unstructured data in multiple formats.
Key Characteristics:
- Natural language-based extraction guidance via
user_prompt - Multi-format output support (JSON, Markdown, CSV, Toon)
- JavaScript rendering capability for dynamic web pages
- Geographic localization through proxy positioning
- Schema-driven structured extraction with optional automatic schema generation
- Polling-based async job completion handling with configurable timeout
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py:1-31
Architecture
Class Hierarchy
graph TD
A[OxyStudioAIClient] --> B[AiCrawler]
B --> C[AiCrawlerJob]
B1[BaseModel] --> CThe AiCrawler class inherits from OxyStudioAIClient, which provides the underlying API client functionality including authentication, request handling, and response parsing.
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py:23-31
Data Models
class AiCrawlerJob(BaseModel):
run_id: str
message: str | None = None
data: list[dict[str, Any]] | list[str] | None = None
| Field | Type | Description | ||
|---|---|---|---|---|
run_id | str | Unique identifier for the crawl job | ||
message | `str \ | None` | Error code or status message if job failed | |
data | `list[dict[str, Any]] \ | list[str] \ | None` | Extracted content based on output format |
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py:27-30
Configuration Constants
| Constant | Value | Purpose |
|---|---|---|
CRAWLER_TIMEOUT_SECONDS | 600 (10 minutes) | Maximum time to wait for job completion |
POLL_INTERVAL_SECONDS | 5 | Interval between status checks |
POLL_MAX_ATTEMPTS | 120 | Maximum polling attempts before timeout |
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py:12-14
Core Methods
`crawl()`
The primary method for initiating a crawl operation.
def crawl(
self,
url: str,
user_prompt: str,
output_format: Literal["json", "markdown", "csv", "toon"] = "markdown",
schema: dict[str, Any] | None = None,
render_javascript: bool = False,
return_sources_limit: int = 25,
geo_location: str | None = None,
max_credits: int | None = None,
) -> AiCrawlerJob
#### Parameters
| Parameter | Type | Default | Required | Description | |
|---|---|---|---|---|---|
url | str | - | Yes | Starting URL to crawl | |
user_prompt | str | - | Yes | Natural language prompt to guide extraction | |
output_format | Literal["json", "markdown", "csv", "toon"] | "markdown" | No | Desired output format | |
schema | `dict[str, Any] \ | None` | None | Conditional | JSON schema for structured extraction (required for json, csv, toon formats) |
render_javascript | bool | False | No | Enable JavaScript rendering | |
return_sources_limit | int | 25 | No | Maximum number of sources to return | |
geo_location | `str \ | None` | None | No | Proxy location in ISO2 format or country name |
max_credits | `int \ | None` | None | No | Maximum credits to consume |
#### Validation Rules
if output_format in ["json", "csv", "toon"] and schema is None:
raise ValueError(
"openapi_schema is required when output_format is json, csv or toon.",
)
When using json, csv, or toon output formats, a valid JSON schema must be provided. Markdown format does not require a schema.
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py:47-52
`generate_schema()`
Automatically generates a JSON schema based on a natural language prompt.
def generate_schema(self, prompt: str) -> dict[str, Any] | None
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | str | Yes | Natural language description of desired data structure |
Returns: A dictionary containing the generated JSON schema.
Process Flow:
- Sends prompt to
/crawl/generate-paramsendpoint - Validates response status code (must be 200)
- Parses and returns schema response
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py:89-103
Workflow
sequenceDiagram
participant User
participant AiCrawler
participant API
participant PollService
User->>AiCrawler: crawl(url, user_prompt, output_format, schema)
AiCrawler->>API: POST /crawl/run
API-->>AiCrawler: run_id
AiCrawler->>PollService: Start polling
PollService->>API: GET /crawl/run/data?run_id=xxx
alt Status: processing
API-->>PollService: 202 Accepted
PollService->>PollService: wait(POLL_INTERVAL_SECONDS)
PollService->>API: GET /crawl/run/data
end
alt Status: completed
API-->>PollService: 200 + data
PollService-->>User: AiCrawlerJob(data)
else Status: failed
API-->>PollService: 200 + failed status
PollService-->>User: AiCrawlerJob(message=error)
else Timeout
PollService-->>User: TimeoutError
endJob Completion States
| Status | Response | Action |
|---|---|---|
processing | 202 | Continue polling at POLL_INTERVAL_SECONDS |
completed | 200 with data | Return AiCrawlerJob with extracted data |
failed | 200 with failed | Return AiCrawlerJob with error message |
| Timeout | After 10 minutes | Raise TimeoutError |
| KeyboardInterrupt | User cancels | Log and re-raise KeyboardInterrupt |
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py:54-85
Usage Examples
Basic Markdown Crawl
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler
crawler = AiCrawler(api_key="<API_KEY>")
url = "https://oxylabs.io"
result = crawler.crawl(
url=url,
user_prompt="Find all pages with proxy products pricing",
output_format="markdown",
render_javascript=False,
return_sources_limit=3,
geo_location="France",
)
print("Results:")
for item in result.data:
print(item, "\n")
Sources: examples/crawl_markdown.py:1-18
JSON Extraction with Generated Schema
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler
crawler = AiCrawler(api_key="<API_KEY>")
schema = crawler.generate_schema(
prompt="proxy plans which have name, price, and features",
)
print("schema: ", schema)
url = "https://oxylabs.io"
result = crawler.crawl(
url=url,
user_prompt="Find all pages with proxy products pricing",
output_format="json",
schema=schema,
render_javascript=False,
)
print("Results:")
for item in result.data:
print(item, "\n")
Sources: examples/crawl_generated_schema.py:1-24
Structured Extraction with Pydantic Schema
from pydantic import BaseModel, Field
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler
crawler = AiCrawler(api_key="<API_KEY>")
class ProxyPlan(BaseModel):
name: str = Field(description="The name of the proxy plan")
price: str = Field(description="The price of the proxy plan")
features: list[str] = Field(description="The features of the proxy plan")
class ProxyPlans(BaseModel):
proxy_plans: list[ProxyPlan] = Field(description="The proxy plans")
url = "https://oxylabs.io/"
result = crawler.crawl(
url=url,
user_prompt="Find all pages with proxy products pricing",
output_format="json",
schema=ProxyPlans.model_json_schema(),
render_javascript=False,
)
Sources: examples/crawl_pydantic_schema.py:1-28
Output Formats
| Format | Schema Required | Data Type in AiCrawlerJob.data | Use Case |
|---|---|---|---|
markdown | No | list[str] | Content summarization, human-readable output |
json | Yes | list[dict[str, Any]] | Structured data processing, API integration |
csv | Yes | list[dict[str, Any]] | Spreadsheet imports, tabular analysis |
toon | Yes | list[dict[str, Any]] | Specialized structured format |
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py:41-46
Geographic Localization
The geo_location parameter supports multiple formats:
| Format | Example | Description |
|---|---|---|
| ISO 2-letter code | "US" | US, GB, DE, FR, etc. |
| Country canonical name | "United States" | Capitalized full name |
| Coordinate formats | See SERP Localization docs | Advanced localization |
Sources: readme.md
Error Handling
Schema Validation Error
# This raises ValueError
result = crawler.crawl(
url="https://example.com",
user_prompt="Extract prices",
output_format="json",
schema=None, # Missing schema
)
# ValueError: openapi_schema is required when output_format is json, csv or toon.
Timeout Handling
try:
result = crawler.crawl(
url="https://example.com",
user_prompt="Extract all products",
output_format="markdown",
)
except TimeoutError as e:
print(f"Crawl failed: {e}")
# Handle timeout - consider retrying with reduced scope
Keyboard Interrupt
When a user cancels the operation mid-polling, the crawler logs the cancellation and re-raises the KeyboardInterrupt:
except KeyboardInterrupt:
logger.info("[Cancelled] Crawling was cancelled by user.")
raise KeyboardInterrupt from None
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py:80-82
Best Practices
1. Set Appropriate Source Limits
# Limit to most relevant sources
result = crawler.crawl(
url="https://ecommerce.example.com",
user_prompt="Product pages with pricing",
return_sources_limit=10, # Balance between coverage and performance
)
2. Use Specific Prompts
# Good: Specific and actionable
result = crawler.crawl(
url="https://example.com",
user_prompt="Find all blog posts published in 2024 with author names and publication dates",
)
# Less effective: Too vague
result = crawler.crawl(
url="https://example.com",
user_prompt="Find stuff",
)
3. Handle JavaScript Rendering Selectively
# Only enable if necessary - adds latency
result = crawler.crawl(
url="https://spa.example.com",
user_prompt="Extract dashboard metrics",
render_javascript=True, # Required for SPAs
)
4. Credit Management
# Set maximum credits for cost control
result = crawler.crawl(
url="https://example.com",
user_prompt="Extract product data",
max_credits=100, # Prevents runaway costs
)
Related Features
| Feature | Module | Purpose |
|---|---|---|
| AI-Scraper | AiScraper | Single-page extraction without crawling |
| AI-Search | AiSearch | Search engine result extraction |
| AI-Map | AiMap | URL discovery and site mapping |
| Browser-Agent | BrowserAgent | Interactive browser automation |
Sources: readme.md
AI-Search Feature
Related topics: AI-Scraper Feature, Client Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AI-Scraper Feature, Client Architecture
AI-Search Feature
Overview
The AI-Search feature provides a programmatic interface for performing AI-powered search engine results page (SERP) queries. It enables users to search for information and retrieve results with optional full content extraction, JavaScript rendering support, and geographic localization.
The feature offers two search modes:
- Standard Search: A polling-based approach for retrieving comprehensive search results with content
- Instant Search: A lightweight endpoint optimized for quick results (up to 10 results) without content
Sources: src/oxylabs_ai_studio/apps/ai_search.py:1-50
Architecture
Class Hierarchy
The AI-Search feature is built on the OxyStudioAIClient base class, which provides HTTP client functionality and API communication capabilities.
class AiSearch(OxyStudioAIClient):
"""AI Search app."""
Sources: src/oxylabs_ai_studio/apps/ai_search.py:37-38
Module Structure
| Component | File | Responsibility |
|---|---|---|
| AiSearch | ai_search.py | Main synchronous interface |
| AiSearchJob | ai_search.py | Response data model |
| SearchResult | ai_search.py | Individual result data model |
Data Models
SearchResult
Represents a single search result entry.
| Field | Type | Description | |
|---|---|---|---|
| url | str | The URL of the search result | |
| title | str | The title of the search result | |
| description | str | The description/snippet of the search result | |
| content | str \ | None | Full content of the page (when return_content=True) |
Sources: src/oxylabs_ai_studio/apps/ai_search.py:22-27
AiSearchJob
Represents the complete search job response.
| Field | Type | Description | |
|---|---|---|---|
| run_id | str | Unique identifier for the search job | |
| message | str \ | None | Status message or error code |
| data | list[SearchResult] \ | None | List of search results |
Sources: src/oxylabs_ai_studio/apps/ai_search.py:29-31
API Methods
Synchronous Interface
#### search()
Performs a standard search with polling until results are available.
def search(
self,
query: str,
limit: int = 10,
render_javascript: bool = False,
return_content: bool = True,
geo_location: str | None = None,
) -> AiSearchJob
Parameters:
| Parameter | Type | Default | Description | |
|---|---|---|---|---|
| query | str | required | The search query string | |
| limit | int | 10 | Maximum number of results (max: 50) | |
| render_javascript | bool | False | Enable JavaScript rendering | |
| return_content | bool | True | Include full content in results | |
| geo_location | str \ | None | None | Geographic location for localized results |
Return Type: AiSearchJob
Sources: src/oxylabs_ai_studio/apps/ai_search.py:43-55
#### instant_search()
Performs a fast search using the instant endpoint without polling.
def instant_search(
self,
query: str,
limit: int = 10,
geo_location: str | None = None,
) -> AiSearchJob
Parameters:
| Parameter | Type | Default | Description | |
|---|---|---|---|---|
| query | str | required | The search query string | |
| limit | int | 10 | Maximum number of results (max: 10) | |
| geo_location | str \ | None | None | Geographic location for localized results |
Note: Instant search automatically bypasses the polling mechanism when limit <= 10 and return_content=False.
Sources: src/oxylabs_ai_studio/apps/ai_search.py:95-106
Asynchronous Interface
#### search_async()
Async version of the standard search method.
async def search_async(
self,
query: str,
limit: int = 10,
render_javascript: bool = False,
return_content: bool = True,
geo_location: str | None = None,
) -> AiSearchJob
#### instant_search_async()
Async version of the instant search method.
async def instant_search_async(
self,
query: str,
limit: int = 10,
geo_location: str | None = None,
) -> AiSearchJob
Sources: src/oxylabs_ai_studio/apps/ai_search.py:108-148
Workflow and State Management
Standard Search Polling Flow
graph TD
A[Start search] --> B[Call /search/run API]
B --> C[Extract run_id]
C --> D[Call /search/run/data API]
D --> E{Status Check}
E -->|202 Pending| F[Wait POLL_INTERVAL_SECONDS]
E -->|200 Completed| G[Return AiSearchJob with data]
E -->|200 Failed| H[Return AiSearchJob with error]
F --> D
H --> I[End with error]
G --> J[End success]
style A fill:#e1f5ff
style G fill:#c8e6c9
style H fill:#ffcdd2Instant Search Flow
graph TD
A[Start instant_search] --> B[Call /search/instant API]
B --> C{Status 200?}
C -->|Yes| D[Parse response JSON]
C -->|No| E[Raise Exception]
D --> F[Return AiSearchJob]
F --> G[End success]
E --> H[End with error]
style A fill:#e1f5ff
style F fill:#c8e6c9
style E fill:#ffcdd2Endpoint Selection Logic
graph TD
A[search called] --> B{limit <= 10?}
B -->|Yes| C{return_content == False?}
B -->|No| D[Use standard /search/run]
C -->|Yes| E[Use instant /search/instant]
C -->|No| D
E --> F[Return immediately]
D --> G[Start polling]
G --> H{Status completed?}
H -->|Yes| I[Return results]
H -->|No| J{Status failed?}
J -->|Yes| K[Return with error]
J -->|No| L[Continue polling]
L --> G
style A fill:#e1f5ff
style E fill:#c8e6c9
style K fill:#ffcdd2Configuration Constants
| Constant | Value | Description |
|---|---|---|
| SEARCH_TIMEOUT_SECONDS | 180 (60 * 3) | Maximum time to wait for search completion |
| POLL_INTERVAL_SECONDS | 5 | Time between polling attempts |
| POLL_MAX_ATTEMPTS | 36 | Maximum number of polling attempts |
Sources: src/oxylabs_ai_studio/apps/ai_search.py:11-13
Usage Examples
Search with Content
Retrieves search results including full page content:
from oxylabs_ai_studio.apps.ai_search import AiSearch
search = AiSearch(api_key="<API_KEY>")
query = "lasagna recipe"
result = search.search(
query=query,
limit=5,
render_javascript=False,
return_content=True,
)
print(result.data)
Sources: examples/search_with_content.py:1-15
Search Without Content
Performs a lightweight search returning only URL, title, and description:
from oxylabs_ai_studio.apps.ai_search import AiSearch
search = AiSearch(api_key="<API_KEY>")
query = "lasagna"
result = search.search(
query=query,
limit=5,
render_javascript=False,
return_content=False,
geo_location="Italy",
)
print(result.data)
Sources: examples/search_no_content.py:1-17
Instant Search
Fast search for up to 10 results with geographic localization:
from oxylabs_ai_studio.apps.ai_search import AiSearch
search = AiSearch(api_key="<API_KEY>")
query = "lasagna recipes"
result = search.instant_search(
query=query,
limit=5,
geo_location="United States",
)
print(result.data)
Sources: examples/search_instant.py:1-14
Geographic Localization
The geo_location parameter supports multiple formats:
| Format | Example |
|---|---|
| ISO 2-letter code | "US", "DE", "FR" |
| Country canonical name | "United States", "Germany", "France" |
| Coordinate formats | Supported per SERP Localization docs |
Supported locations are documented at: SERP Localization
Error Handling
| Scenario | Behavior |
|---|---|
| Empty query | Raises ValueError("query is required") |
| API returns non-200 status | Raises Exception with response text |
| Search timeout | Raises TimeoutError |
| Keyboard interrupt | Logs cancellation and re-raises |
Sources: src/oxylabs_ai_studio/apps/ai_search.py:77-82
Key Implementation Details
Request Body Construction
Both search methods construct a standardized request body:
body = {
"query": query,
"limit": limit,
"render_javascript": render_javascript,
"return_content": return_content,
"geo_location": geo_location,
}
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
| /search/run | POST | Create a new search job |
| /search/run/data | GET | Poll for search results |
| /search/instant | POST | Execute instant search |
Timeout Calculation
POLL_MAX_ATTEMPTS = SEARCH_TIMEOUT_SECONDS // POLL_INTERVAL_SECONDS
# 180 // 5 = 36 attemptsAI-Map Feature
Related topics: AI-Crawler Feature, Client Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AI-Crawler Feature, Client Architecture
AI-Map Feature
Overview
The AI-Map feature is a URL discovery and site mapping tool within the Oxylabs AI Studio Python SDK. It enables users to explore website structures by mapping URLs based on specified keywords, crawl depth, and filtering criteria. The feature automatically discovers URLs from sitemaps and linked pages, returning a structured list of discovered endpoints that match user-defined search parameters.
AI-Map serves as the first step in many web scraping workflows, helping users understand the structure of a target website before proceeding with detailed content extraction using tools like AiCrawler or AiScraper. Sources: readme.md
Core Functionality
The AiMap class provides a single primary method: map(). This method accepts a comprehensive configuration payload that controls URL discovery behavior. The feature supports:
- Keyword-based filtering: Filter discovered URLs by search keywords or natural language prompts
- Crawl depth control: Limit how deep the mapping exploration goes (1-5 levels)
- Result limiting: Cap the total number of URLs returned
- Geographic targeting: Discover URLs with specific geo-location configurations
- JavaScript rendering: Enable JS rendering for dynamically loaded links
- Sitemap integration: Include or exclude sitemap-based URL discovery
- Domain scope control: Allow or restrict subdomains and external domains
Sources: examples/ai_map.py
Architecture
graph TD
A[User calls ai_map.map payload] --> B[AiMap.map method]
B --> C[Build request payload]
C --> D[POST to /map endpoint]
D --> E{Response status?}
E -->|pending| F[Poll for completion]
E -->|completed| G[Return AiMapJob]
E -->|failed| H[Return error]
F --> E
G --> I[Extract result.data]
H --> J[Raise exception]
style A fill:#e1f5ff
style G fill:#c8e6c9
style J fill:#ffcdd2Class Reference
AiMap
Module: oxylabs_ai_studio.apps.ai_map
Constructor:
AiMap(api_key: str)
| Parameter | Type | Description |
|---|---|---|
| api_key | str | Oxylabs API key for authentication (required) |
Sources: examples/ai_map.py
map() Method
Signature:
def map(self, **payload) -> AiMapJob
Parameters Table:
| Parameter | Type | Default | Required | Description | |
|---|---|---|---|---|---|
| url | str | - | Yes | Starting URL or domain to map | |
| search_keywords | list[str] | None | No | Keywords for URL path filtering | |
| user_prompt | str \ | None | None | No | Natural language prompt for keyword search. Can be used together with 'search_keywords' or standalone |
| max_crawl_depth | int | 1 | No | Maximum crawl depth (range: 1-5) | |
| limit | int | 25 | No | Maximum number of URLs to return | |
| geo_location | str | None | No | Proxy location in ISO2 format or country canonical name | |
| render_javascript | bool | False | No | Enable JavaScript rendering for dynamic content | |
| include_sitemap | bool | True | No | Include sitemap as a seed source for URL discovery | |
| max_credits | int \ | None | None | No | Maximum credits to use for this operation |
| allow_subdomains | bool | False | No | Allow mapping of subdomain URLs | |
| allow_external_domains | bool | False | No | Allow mapping of external domain URLs |
Sources: readme.md
Usage Examples
Basic URL Mapping
from oxylabs_ai_studio.apps.ai_map import AiMap
ai_map = AiMap(api_key="<API_KEY>")
payload = {
"url": "https://oxylabs.io",
"search_keywords": ["blog"],
"max_crawl_depth": 3,
"limit": 50,
"render_javascript": False,
"include_sitemap": True,
"max_credits": None,
"allow_subdomains": False,
"allow_external_domains": False,
}
result = ai_map.map(**payload)
print(result.data)
Sources: examples/ai_map.py
Mapping Career Pages
from oxylabs_ai_studio.apps.ai_map import AiMap
ai_map = AiMap(api_key="<API_KEY>")
payload = {
"url": "https://career.oxylabs.io",
"search_keywords": ["career", "jobs", "vacancy"],
"user_prompt": "job ad pages",
"max_crawl_depth": 2,
"limit": 10,
"geo_location": "Germany",
"render_javascript": False,
"include_sitemap": True,
"max_credits": None,
"allow_subdomains": False,
"allow_external_domains": False,
}
result = ai_map.map(**payload)
print(result.data)
Sources: readme.md
Response Model
AiMapJob
| Field | Type | Description | |
|---|---|---|---|
| run_id | str | Unique identifier for this mapping job | |
| message | str \ | None | Status message or error code |
| data | list \ | None | Discovered URLs matching the search criteria |
Workflow Diagram: Complete Scraping Pipeline
graph LR
A[Define target domain] --> B[Use AiMap to discover URLs]
B --> C{URLs discovered?}
C -->|Yes| D[Filter and select URLs]
C -->|No| E[Adjust keywords/depth]
E --> B
D --> F[Use AiCrawler to crawl content]
F --> G{Detailed extraction needed?}
G -->|Yes| H[Use AiScraper per URL]
G -->|No| I[Process crawled data]
H --> I
I --> J[Store/Analyze results]
style B fill:#fff9c4
style F fill:#c8e6c9
style H fill:#c8e6c9Parameter Interaction
| Parameter | Affects | Interaction Notes |
|---|---|---|
| url | All | Root domain determines scope of mapping |
| max_crawl_depth | API calls, credits | Higher depth increases API usage and discovery scope |
| limit | Result size | Combined with depth to control total URL count |
| search_keywords | Filter accuracy | More specific keywords reduce false positives |
| user_prompt | AI interpretation | Works synergistically with search_keywords |
| include_sitemap | Initial URL seed | When True, sitemap URLs are added to discovery queue |
| geo_location | Content variant | URLs may vary based on geo-targeted content |
| allow_subdomains | Scope expansion | When True, expands discovery beyond main domain |
Best Practices
- Start with low crawl depth: Begin with
max_crawl_depth=1to understand basic site structure before expanding - Use specific keywords: Combine
search_keywordswithuser_promptfor precise URL filtering - Set appropriate limits: Use
limitto prevent excessive API usage and manage response sizes - Enable sitemap: Keep
include_sitemap=Truefor comprehensive initial URL discovery - Consider geo-location: If targeting region-specific pages, specify
geo_locationin the initial mapping
Common Use Cases
| Use Case | Recommended Configuration |
|---|---|
| Blog post discovery | {"search_keywords": ["blog", "article"], "max_crawl_depth": 2} |
| E-commerce product pages | {"search_keywords": ["product", "shop"], "max_crawl_depth": 3} |
| Documentation site mapping | {"include_sitemap": True, "max_crawl_depth": 4} |
| Job listing discovery | {"search_keywords": ["jobs", "careers", "vacancy"], "max_crawl_depth": 2} |
| News article aggregation | {"search_keywords": ["news", "article"], "limit": 100} |
Integration with Other Features
The AI-Map feature is designed to work as part of a larger scraping pipeline. After discovering URLs, users typically proceed with:
- AiCrawler: For bulk content extraction from discovered URLs
- AiScraper: For detailed structured data extraction from individual pages
- BrowserAgent: For interactive browsing tasks requiring user-like navigation
Sources: readme.md, agentic_code_guide.md
Sources: examples/ai_map.py
Browser Agent Feature
Related topics: AI-Scraper Feature, Data Models
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AI-Scraper Feature, Data Models
Browser Agent Feature
Overview
The Browser Agent is a powerful browser automation tool within the Oxylabs AI Studio Python SDK that enables programmatic control of web browsers to perform complex actions such as clicking, scrolling, navigating, and extracting data from dynamic web pages. Unlike traditional scraping methods, the Browser Agent accepts natural language prompts to guide its behavior, making it particularly effective for websites that require JavaScript rendering or user interaction.
The feature serves as a bridge between high-level natural language instructions and low-level browser automation, abstracting away the complexities of web interaction while maintaining flexibility for various use cases. It integrates seamlessly with other AI Studio components like schema generation to provide structured data extraction capabilities.
Architecture
Component Overview
The Browser Agent feature is built on a client-server architecture where the Python SDK acts as a thin client communicating with Oxylabs' cloud-based browser automation infrastructure.
graph TD
A[BrowserAgent Python Client] --> B[AI Studio API Gateway]
B --> C[Browser Agent Service]
C --> D[Browser Instance Pool]
D --> E[Target Website]
F[Schema Generation Service] --> BKey Components
| Component | Location | Responsibility |
|---|---|---|
BrowserAgent | src/oxylabs_ai_studio/apps/browser_agent.py | Main client class for browser automation |
| API Client | Base HTTP client | Handles HTTP communication with API |
| Polling Mechanism | Built-in | Monitors job status until completion |
| Schema Generator | Built-in | Creates OpenAPI schemas from prompts |
Class Hierarchy
BaseClient
└── BrowserAgent
├── run()
├── run_async()
├── generate_schema()
├── generate_schema_async()
├── call_api()
├── call_api_async()
└── get_client() / async_client()
Data Models
BrowserAgentJob
The primary output model returned by Browser Agent operations.
class DataModel(BaseModel):
type: Literal["json", "markdown", "html", "screenshot", "csv"]
content: dict[str, Any] | str | None
class BrowserAgentJob(BaseModel):
run_id: str
message: str | None = None
data: DataModel | None = None
| Field | Type | Description | |
|---|---|---|---|
run_id | str | Unique identifier for the browser agent job | |
message | `str \ | None` | Error message or status information if job failed |
data | `DataModel \ | None` | Contains the extracted data with type and content |
DataModel Fields
| Field | Type | Description | ||
|---|---|---|---|---|
type | Literal["json", "markdown", "html", "screenshot", "csv"] | Format of the extracted content | ||
content | `dict[str, Any] \ | str \ | None` | The actual extracted data |
API Reference
BrowserAgent Class
Import Statement:
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent
Initialization:
browser_agent = BrowserAgent(api_key="<API_KEY>")
Method: `run()`
Synchronous method to execute browser agent tasks.
Signature:
def run(
self,
url: str,
user_prompt: str,
output_format: Literal["json", "markdown"] = "markdown",
schema: dict | None = None,
geo_location: str | None = None,
user_agent: str | None = None,
render_javascript: bool | str = "auto",
) -> BrowserAgentJob
Parameters:
| Parameter | Type | Required | Default | Description | |
|---|---|---|---|---|---|
url | str | Yes | - | Target URL to browse and interact with | |
user_prompt | str | Yes | - | Natural language prompt describing the task to perform | |
output_format | Literal["json", "markdown"] | No | "markdown" | Desired output format for extracted data | |
schema | `dict \ | None` | Conditional | None | OpenAPI JSON schema for structured extraction (required when output_format="json") |
geo_location | `str \ | None` | No | None | Proxy location in ISO2 format or country canonical name |
user_agent | `str \ | None` | No | None | Custom User-Agent request header |
render_javascript | `bool \ | str` | No | "auto" | JavaScript rendering option; can be True, False, or "auto" |
Returns: BrowserAgentJob object containing the job result
Example Usage:
browser_agent = BrowserAgent(api_key="<API_KEY>")
prompt = "Find if there is game 'super mario odyssey' in the store."
url = "https://sandbox.oxylabs.io/"
result = browser_agent.run(
url=url,
user_prompt=prompt,
output_format="json",
schema={"type": "object", "properties": {"page_url": {"type": "string"}}, "required": []},
)
print(result.data)
Sources: src/oxylabs_ai_studio/apps/browser_agent.py:1-200
Method: `run_async()`
Asynchronous method to execute browser agent tasks without blocking.
Signature:
async def run_async(
self,
url: str,
user_prompt: str,
output_format: Literal["json", "markdown"] = "markdown",
schema: dict | None = None,
geo_location: str | None = None,
) -> BrowserAgentJob
Example Usage:
import asyncio
from oxylabs_ai_studio.apps.browser_agent import BrowserAgent
browser_agent = BrowserAgent(api_key="<API_KEY>")
async def main():
prompt = "Find if there is game 'super mario odyssey' in the store."
url = "https://sandbox.oxylabs.io/"
result = await browser_agent.run_async(
url=url,
user_prompt=prompt,
output_format="json",
schema={"type": "object", "properties": {"page_url": {"type": "string"}}, "required": []},
)
print(result.data)
asyncio.run(main())
Sources: src/oxylabs_ai_studio/apps/browser_agent.py:200-280
Method: `generate_schema()`
Generates a JSON schema for structured data extraction based on a natural language prompt.
Signature:
def generate_schema(self, prompt: str) -> dict[str, Any] | None
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | str | Yes | Natural language description of the data structure to extract |
Returns: Dictionary containing the generated OpenAPI schema, or None if generation fails
Example Usage:
browser_agent = BrowserAgent(api_key="<API_KEY>")
schema = browser_agent.generate_schema(
prompt="game name, platform, review stars and price"
)
print("schema: ", schema)
Sources: src/oxylabs_ai_studio/apps/browser_agent.py:180-195
Method: `generate_schema_async()`
Asynchronous version of generate_schema().
Signature:
async def generate_schema_async(self, prompt: str) -> dict[str, Any] | None
Sources: src/oxylabs_ai_studio/apps/browser_agent.py:145-165
Execution Flow
Synchronous Execution Workflow
sequenceDiagram
participant Client as BrowserAgent Client
participant API as AI Studio API
participant Service as Browser Agent Service
Client->>API: POST /browser-agent/run
Note over API: Returns run_id (status: 201)
Client->>API: GET /browser-agent/run/data?run_id=xxx
API-->>Client: status: processing
loop Poll until complete
Client->>API: GET /browser-agent/run/data?run_id=xxx
API-->>Client: status: processing
end
API-->>Client: status: completed, data returnedJob Status States
The Browser Agent job follows a state machine pattern with the following statuses:
| Status | Description | Action |
|---|---|---|
processing | Job is currently executing | Continue polling |
completed | Job finished successfully | Return result |
failed | Job encountered an error | Return error message |
| HTTP 202 | Job still initializing | Continue polling |
| HTTP 200 with no data | Unknown state | Continue polling |
Sources: src/oxylabs_ai_studio/apps/browser_agent.py:40-80
Polling Mechanism
The synchronous run() method implements a polling mechanism with the following characteristics:
- Poll Interval: Configured via
POLL_INTERVAL_SECONDSconstant - Timeout Handling: Raises
TimeoutErrorif job does not complete within expected timeframe - Interrupt Support: Catches
KeyboardInterruptto gracefully cancel operations
# Polling loop structure (simplified)
while True:
get_response = self.call_api(...)
resp_body = get_response.json()
if resp_body["status"] == "completed":
return BrowserAgentJob(run_id=run_id, data=resp_body["data"])
if resp_body["status"] == "failed":
return BrowserAgentJob(run_id=run_id, message=resp_body.get("error_code"))
time.sleep(POLL_INTERVAL_SECONDS)
Use Cases
E-commerce Product Discovery
The Browser Agent excels at navigating websites that require user interaction:
schema = browser_agent.generate_schema(
prompt="game name, platform, review stars and price"
)
prompt = "Find if there is game 'super mario odyssey' in the store. If there is, find the price. Use search bar to find the game."
result = browser_agent.run(
url="https://sandbox.oxylabs.io/",
user_prompt=prompt,
output_format="json",
schema=schema,
geo_location="Spain",
)
Recommended Workflow for Complex Extraction
For multi-step extraction tasks, combine Browser Agent with other AI Studio tools:
- Browser Agent: Navigate to the target page and identify relevant URLs
- AiScraper: Extract structured data from identified pages
- Schema Generation: Create appropriate schemas for each extraction phase
Sources: examples/browser_agent.py
Error Handling
Exception Types
| Exception | Cause | Handling |
|---|---|---|
TimeoutError | Job exceeded timeout threshold | Retry with exponential backoff |
KeyboardInterrupt | User cancelled operation | Clean up and exit gracefully |
Exception | API request failed | Check API key, network connectivity |
Error Response Handling
if resp_body["status"] == "failed":
return BrowserAgentJob(
run_id=run_id,
message=resp_body.get("error_code", None),
data=None,
)
Schema Generation Errors
if response.status_code != 200:
raise Exception(f"Failed to generate schema: {response.text}")
Configuration Options
Proxy Location
Specify geographic location for requests:
result = browser_agent.run(
url="https://example.com",
user_prompt="Extract product information",
geo_location="Germany", # or "DE" for ISO2 format
)
Supported formats:
- ISO 2-letter country codes (e.g., "DE", "US")
- Country canonical names (e.g., "Germany", "United States")
JavaScript Rendering
Control JavaScript rendering behavior:
| Value | Behavior |
|---|---|
False | No JavaScript rendering (fastest) |
True | Always render JavaScript |
"auto" | Service automatically detects if rendering is needed |
User-Agent Customization
result = browser_agent.run(
url="https://example.com",
user_prompt="Navigate and extract",
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
)
Best Practices
- Schema Definition: Always provide a well-defined schema when using
output_format="json"for predictable results
- Async for Multiple Tasks: Use
run_async()when running multiple browser agents concurrently to maximize throughput
- Interrupt Handling: Wrap long-running operations in try-except blocks to handle user cancellations
``python for attempt in range(3): try: result = browser_agent.run(url=url, user_prompt=prompt) break except TimeoutError: time.sleep(2 ** attempt) ``
- Error Retries: Implement retry logic with exponential backoff for transient failures:
- Geo-Location: Use appropriate
geo_locationvalues when targeting region-specific content
Comparison with Other Apps
| Feature | Browser Agent | AI Scraper | AI Crawler |
|---|---|---|---|
| Navigation Actions | ✅ | ❌ | ❌ |
| JavaScript Interaction | ✅ | Configurable | Configurable |
| Pagination Handling | ✅ (manual) | Manual | Automatic |
| Single Page Focus | ✅ | ✅ | ❌ |
| Schema Generation | ✅ | ✅ | ✅ |
| Output Formats | json, markdown | json, markdown, csv, screenshot | json, markdown, csv, toon |
API Endpoints Reference
| Endpoint | Method | Purpose |
|---|---|---|
/browser-agent/run | POST | Initiate a browser agent job |
/browser-agent/run/data | GET | Poll job status and retrieve results |
/browser-agent/generate-params | POST | Generate extraction schema from prompt |
See Also
- AI Scraper Feature - Single-page content extraction
- AI Crawler Feature - Multi-page website crawling
- AI Search Feature - Search engine result extraction
- AI Map Feature - Site mapping and discovery
Client Architecture
Related topics: Data Models, Error Handling and Logging
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Data Models, Error Handling and Logging
Client Architecture
Overview
The Oxylabs AI Studio Python SDK follows a layered client architecture that provides a unified interface for interacting with various AI-powered web scraping and data extraction services. The architecture separates concerns between HTTP communication, API interaction, and application-specific logic, enabling modularity and maintainability.
The client layer serves as the foundation for all application modules (AiScraper, AiSearch, AiCrawler, BrowserAgent, AiMap) by providing shared functionality for API communication, authentication, request building, and response handling.
Architecture Components
Component Overview
| Component | File | Purpose |
|---|---|---|
APIClient | client.py | Core HTTP client for all API communications |
BaseApp | client.py | Abstract base class for application modules |
| Utility Functions | utils.py | Logging, retry logic, and helper utilities |
| Application Modules | apps/*.py | Domain-specific API wrappers |
Class Hierarchy
graph TD
A[APIClient] --> B[BaseApp]
B --> C[AiScraper]
B --> D[AiSearch]
B --> E[AiCrawler]
B --> F[BrowserAgent]
B --> G[AiMap]
H[Requests Session] --> A
I[Configuration] --> AAPI Client (`APIClient`)
Purpose and Responsibilities
The APIClient class is the core HTTP communication layer that handles:
- Authentication: Attaches API credentials to all requests
- Connection Management: Manages HTTP session lifecycle
- Base URL Configuration: Stores the API endpoint configuration
- Request Execution: Performs HTTP calls to the Oxylabs API
Configuration Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
api_key | str | Authentication key for Oxylabs API | Required |
base_url | str | API base endpoint | https://ai.oxylabs.io/api/v1 |
timeout | int | Request timeout in seconds | Configurable |
max_retries | int | Maximum retry attempts for failed requests | Configurable |
Key Methods
graph TD
A[make_request] --> B{Method Type}
B -->|POST| C[POST Request]
B -->|GET| D[GET Request]
C --> E[Attach JSON Body]
D --> F[Attach Query Params]
E --> G[Execute Request]
F --> G
G --> H{Response Status}
H -->|2xx| I[Return Response]
H -->|4xx/5xx| J[Raise Exception]The APIClient exposes methods that all application modules use for API communication:
call_api(client, url, method, body, params)- Generic API call methodget_client()- Returns configured HTTP client instance- Session management methods for connection pooling
Base Application Class (`BaseApp`)
Purpose and Responsibilities
The BaseApp class serves as an abstract base for all application-specific modules. It provides:
- Common API Interface: Unified
call_api()method across all apps - Client Initialization: Automatic HTTP client setup with authentication
- Polling Infrastructure: Shared job status polling mechanism
- Error Handling: Standardized exception handling patterns
Polling Mechanism
All async operations use a polling pattern to check job completion:
graph TD
A[Submit Job Request] --> B[Get run_id]
B --> C[Poll Status Endpoint]
C --> D{Status Check}
D -->|202 Processing| C
D -->|200 Completed| E[Return Result]
D -->|Error| F[Return Error]
G[Max Timeout] -->|Exceeded| H[Raise TimeoutError]Polling Configuration
| Parameter | Value | Description |
|---|---|---|
POLL_INTERVAL_SECONDS | 2 | Seconds between status checks |
MAX_TIMEOUT_SECONDS | 300 | Maximum wait time before timeout |
Application Modules
AiScraper
Located in src/oxylabs_ai_studio/apps/ai_scraper.py, this module provides structured web scraping capabilities.
Key Features:
- Single URL content extraction
- Structured JSON output with custom schemas
- Markdown, HTML, CSV, and screenshot output formats
- JavaScript rendering support
- Geo-location proxy rotation
Core Methods:
scrape()- Synchronous scraping operationscrape_async()- Asynchronous scraping operationgenerate_schema()- AI-powered schema generation from natural language
AiSearch
Located in src/oxylabs_ai_studio/apps/ai_search.py, this module handles search engine result page (SERP) scraping.
Key Features:
- Full search with content retrieval
- Instant search for quick results (up to 10 results)
- Content extraction in markdown format
- Geo-location targeting for localized results
Core Methods:
search()- Full search with contentinstant_search()- Fast search without content polling
AiCrawler
Located in src/oxylabs_ai_studio/apps/ai_crawler.py, this module provides recursive web crawling with AI-guided extraction.
Key Features:
- Multi-page crawling with depth control
- AI-guided data extraction
- Structured JSON output with generated schemas
- Source limitation and filtering
Core Methods:
crawl()- Start crawling operationgenerate_schema()- Generate extraction schema from prompt
Request/Response Flow
Standard API Call Flow
sequenceDiagram
participant App as Application Module
participant Client as API Client
participant API as Oxylabs API
App->>Client: call_api(url, method, body)
Client->>Client: Prepare request headers
Client->>Client: Attach auth (API Key)
Client->>API: HTTP Request
API-->>Client: Response
Client-->>App: Processed ResponseJob-Based Operation Flow
For long-running operations (scraping, crawling, searching):
graph TD
A[Submit Job] --> B[Get run_id]
B --> C[Loop: Poll Status]
C --> D{Response Status}
D -->|202| C
D -->|200 Completed| E[Return Data]
D -->|Failed| F[Return Error Info]Data Models
Common Response Models
| Model | Fields | Description |
|---|---|---|
AiScraperJob | run_id, message, data | Scraping job result |
AiSearchJob | run_id, message, data | Search job result |
AiCrawlerJob | run_id, message, data | Crawling job result |
DataModel | type, content | Extracted data container |
Output Format Types
| Format | Type | Description |
|---|---|---|
json | dict | Structured JSON output |
markdown | str | Markdown formatted text |
html | str | Raw HTML content |
screenshot | str | Base64 encoded image |
csv | str | CSV formatted data |
toon | dict | Tabular object notation |
Authentication
The SDK uses API key-based authentication passed during initialization:
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="<API_KEY>")
The API key is:
- Stored in the
BaseAppinstance configuration - Automatically attached to all outgoing HTTP requests
- Used to authenticate against the Oxylabs AI Studio API endpoint
Configuration and Utils
Logging Configuration
Located in src/oxylabs_ai_studio/utils.py, the SDK provides structured logging for debugging and monitoring:
- Configurable log levels
- Request/response logging
- Error tracing with context
Error Handling
The architecture implements layered error handling:
| Layer | Error Type | Handling |
|---|---|---|
| Client | Network errors | Retry with backoff |
| API | HTTP errors | Exception with response details |
| Application | Business logic | Domain-specific exceptions |
Timeout Configuration
| Parameter | Default | Description |
|---|---|---|
| Request timeout | Varies by endpoint | Per-request timeout |
| Polling timeout | 300 seconds | Maximum wait for job completion |
| Poll interval | 2 seconds | Time between status checks |
Usage Patterns
Synchronous Usage
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="<API_KEY>")
result = scraper.scrape(
url="https://example.com",
output_format="json",
schema={"type": "object", ...}
)
Asynchronous Usage
import asyncio
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="<API_KEY>")
async def main():
result = await scraper.scrape_async(
url="https://example.com",
output_format="markdown"
)
print(result.data)
asyncio.run(main())
Summary
The Client Architecture of oxylabs-ai-studio-py provides:
- Separation of Concerns: HTTP communication isolated in
APIClient - Code Reuse: Common functionality in
BaseAppfor all modules - Extensibility: Easy addition of new application modules
- Reliability: Built-in polling, retry, and timeout mechanisms
- Flexibility: Support for both sync and async operations
All application modules inherit from the shared base architecture, ensuring consistent behavior and API patterns across the SDK.
Source: https://github.com/oxylabs/oxylabs-ai-studio-py / Human Manual
Data Models
Related topics: Client Architecture, AI-Scraper Feature
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Client Architecture, AI-Scraper Feature
Data Models
The oxylabs-ai-studio-py SDK provides a set of Pydantic-based data models that standardize how API responses are structured across all applications. These models serve as the foundation for type-safe data handling, ensuring consistent response parsing regardless of which AI-powered service is being used.
Overview
The SDK implements a layered response model architecture:
| Layer | Model | Purpose |
|---|---|---|
| Container | DataModel | Wraps the actual extracted content with type metadata |
| Response | AiScraperJob, BrowserAgentJob, AiSearchJob, AiCrawlerJob | Top-level job responses containing status, run ID, and data |
This design separates concerns between job metadata (run tracking, error handling) and the actual data payload, allowing flexible content types while maintaining a consistent interface.
Core Response Models
All job response models inherit from Pydantic's BaseModel and share a common structure with three fields.
Common Fields Across All Job Models
| Field | Type | Description | |
|---|---|---|---|
run_id | str | Unique identifier for the API job execution | |
message | `str \ | None` | Error code or status message (nullable) |
data | Varies | The actual response payload (type depends on output format and model) |
AiScraperJob
Located in ai_scraper.py, this model handles single-page scraping responses.
class AiScraperJob(BaseModel):
run_id: str
message: str | None = None
data: str | dict | None
Data Type Mapping:
| Output Format | Data Type |
|---|---|
json | dict |
markdown | str |
csv | str (CSV formatted) |
screenshot | str (base64 encoded) |
Sources: readme.md()
BrowserAgentJob
Located in browser_agent.py, this model handles browser automation task responses. It differs from AiScraperJob by using a nested DataModel structure.
class DataModel(BaseModel):
type: Literal["json", "markdown", "html", "screenshot", "csv"]
content: dict[str, Any] | str | None
class BrowserAgentJob(BaseModel):
run_id: str
message: str | None = None
data: DataModel | None = None
Supported Content Types:
json- Structured JSON data (dict)markdown- Markdown formatted text (str)html- Raw HTML content (str)screenshot- Base64 encoded image (str)csv- CSV formatted data (str)
Sources: agentic_code_guide.md()
AiSearchJob
Located in ai_search.py, this model handles search engine results pages (SERP) responses.
class AiSearchJob(BaseModel):
run_id: str
message: str | None = None
data: Any # Search results list
The data field contains a list of search results, where each result may include:
- URL
- Title
- Snippet
- Additional metadata depending on
return_contentparameter
Sources: src/oxylabs_ai_studio/apps/ai_search.py()
AiCrawlerJob
Located in ai_crawler.py, this model handles web crawling responses.
class AiCrawlerJob(BaseModel):
run_id: str
message: str | None = None
data: list[str] | dict | None # Multiple crawled pages
The data field contains a list of extracted content from crawled pages, formatted according to the specified output_format.
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py()
DataModel Container
The DataModel class provides a unified container for content extraction, wrapping both the data type and actual content together.
classDiagram
class DataModel {
+Literal type
+content: dict~str, Any~ | str | None
}
class BrowserAgentJob {
+str run_id
+str | None message
+DataModel | None data
}
BrowserAgentJob o-- DataModel : containsSchema Integration
The SDK supports both raw JSON schemas and Pydantic model integration for structured data extraction.
JSON Schema Usage
Pass a dictionary following JSON Schema specification:
schema = {
"type": "object",
"properties": {
"price": {"type": "string"},
"title": {"type": "string"}
},
"required": []
}
result = scraper.scrape(
url="https://example.com",
output_format="json",
schema=schema
)
Pydantic Model Usage
For type-safe extraction, use Pydantic models directly:
from pydantic import BaseModel, Field
class Game(BaseModel):
title: str
genre: list[str]
developer: str
platform: str
price: str
description: str
scraper = AiScraper(api_key="<API_KEY>")
result = scraper.scrape(
url="https://sandbox.oxylabs.io/products/1",
output_format="json",
schema=Game.model_json_schema(),
)
Sources: examples/scrape_pydantic_schema.py()
Response Workflow
graph TD
A[API Request] --> B{Output Format}
B -->|json| C[Structured Dict]
B -->|markdown| D[Text String]
B -->|csv| E[CSV String]
B -->|screenshot| F[Base64 String]
C --> G[Response Model]
D --> G
E --> G
F --> G
G --> H[Job Response: run_id, message, data]Example: Accessing Response Data
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="<API_KEY>")
# JSON output
result = scraper.scrape(
url="https://example.com",
output_format="json",
schema={"type": "object", "properties": {"title": {"type": "string"}}}
)
# Access the data
print(result.run_id) # Job identifier
print(result.message) # Error code if any
print(result.data) # Extracted dict content
# Markdown output
result = scraper.scrape(
url="https://example.com",
output_format="markdown"
)
print(result.data) # String content
Sources: examples/scrape_generated_schema.py()
Error Handling
All job models support nullable message fields for error propagation:
result = scraper.scrape(url="https://example.com", ...)
if result.message:
print(f"Error occurred: {result.message}")
else:
print(f"Success: {result.data}")
Output Format Summary
| Format | Data Structure | Schema Required |
|---|---|---|
json | dict | Yes |
markdown | str | No |
html | str | No |
csv | str | Yes |
screenshot | str (base64) | No |
toon | Varies | Yes (Browser Agent only) |
Sources: readme.md()
Configuration and Settings
Related topics: Error Handling and Logging, Client Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Error Handling and Logging, Client Architecture
Configuration and Settings
Overview
The oxylabs-ai-studio-py SDK provides a centralized configuration system built on Pydantic's BaseSettings class. This approach ensures type safety, environment variable validation, and sensible defaults for all configuration values. The configuration module serves as the single source of truth for API credentials and endpoint URLs that are shared across all application modules.
All SDK applications—including AiScraper, AiCrawler, AiSearch, and `BrowserAgent—consume the same configuration settings, making the system consistent and maintainable. Sources: src/oxylabs_ai_studio/settings.py:1-9
Core Configuration Model
The Settings class defines the available configuration parameters with their types, defaults, and validation rules.
class Settings(BaseSettings):
OXYLABS_AI_STUDIO_API_KEY: str | None = None
OXYLABS_AI_STUDIO_API_URL: str = "https://api-aistudio.oxylabs.io"
Configuration Parameters
| Parameter | Type | Default Value | Description | |
|---|---|---|---|---|
OXYLABS_AI_STUDIO_API_KEY | `str \ | None` | None | API authentication key obtained from Oxylabs dashboard |
OXYLABS_AI_STUDIO_API_URL | str | "https://api-aistudio.oxylabs.io" | Base URL for all API requests |
Sources: src/oxylabs_ai_studio/settings.py:1-9
Environment Variable Loading
The SDK automatically loads environment variables using Python's python-dotenv package. The load_dotenv() function is called at module import time, ensuring all environment variables are available before any configuration is accessed. Sources: src/oxylabs_ai_studio/settings.py:3
graph TD
A[Import oxylabs_ai_studio] --> B[load_dotenv executes]
B --> C[Environment Variables Loaded]
C --> D[Settings() instantiated]
D --> E[API_KEY available to all Apps]Application Initialization Pattern
All SDK applications accept an optional api_key parameter in their constructors. When provided, the key is used directly. When omitted, the applications retrieve the API key from the global settings object.
# Direct API key usage
scraper = AiScraper(api_key="<API_KEY>")
# Environment-based API key usage
scraper = AiScraper() # Reads from OXYLABS_AI_STUDIO_API_KEY
This dual approach provides flexibility for different deployment scenarios:
- Explicit Parameter: API key passed directly to constructor
- Environment Variable: API key loaded from
OXYLABS_AI_STUDIO_API_KEYenvironment variable
Sources: src/oxylabs_ai_studio/apps/ai_scraper.py src/oxylabs_ai_studio/apps/ai_crawler.py
Configuration Access in Applications
AiSearch Application
The AiSearch class initializes its HTTP client with the provided API key and uses the configured API URL for all requests.
def get_client(self) -> httpx.Client:
return httpx.Client(
headers={
"Authorization": f"Bearer {self.api_key or settings.OXYLABS_AI_STUDIO_API_KEY}",
"Content-Type": "application/json",
},
base_url=settings.OXYLABS_AI_STUDIO_API_URL,
timeout=httpx.Timeout(60.0, connect=10.0),
)
| Endpoint | HTTP Method | Purpose |
|---|---|---|
/search | POST | Full search with content rendering |
/search/instant | POST | Fast search returning up to 10 results |
Sources: src/oxylabs_ai_studio/apps/ai_search.py
AiCrawler Application
The crawler uses the same client configuration pattern, with the API key and base URL sourced from settings:
def get_client(self) -> httpx.Client:
return httpx.Client(
headers={
"Authorization": f"Bearer {self.api_key or settings.OXYLABS_AI_STUDIO_API_KEY}",
"Content-Type": "application/json",
},
base_url=settings.OXYLABS_AI_STUDIO_API_URL,
timeout=httpx.Timeout(60.0, connect=10.0),
)
| Endpoint | HTTP Method | Purpose |
|---|---|---|
/crawl/run | POST | Initiate a crawl job |
/crawl/run/data | GET | Retrieve crawl results |
/crawl/generate-params | POST | Generate JSON schema from prompt |
Sources: src/oxylabs_ai_studio/apps/ai_crawler.py
AiScraper Application
The scraper follows the identical pattern for HTTP client initialization:
def get_client(self) -> httpx.Client:
return httpx.Client(
headers={
"Authorization": f"Bearer {self.api_key or settings.OXYLABS_AI_studio.API_KEY}",
"Content-Type": "application/json",
},
base_url=settings.OXYLABS_AI_STUDIO_API_URL,
timeout=httpx.Timeout(60.0, connect=10.0),
)
| Endpoint | HTTP Method | Purpose |
|---|---|---|
/scrape | POST | Initiate a scrape job |
/scrape/schema | POST | Generate JSON schema from prompt |
Sources: src/oxylabs_ai_studio/apps/ai_scraper.py
HTTP Client Configuration
All applications share identical HTTP client configuration through a standardized get_client() method:
| Parameter | Value | Description |
|---|---|---|
Authorization | Bearer {API_KEY} | OAuth 2.0 bearer token authentication |
Content-Type | application/json | Request payload format |
timeout | 60.0s (read), 10.0s (connect) | Request timeout configuration |
base_url | settings.OXYLABS_AI_STUDIO_API_URL | API base endpoint |
Setting Up Environment Variables
Recommended `.env` File
Create a .env file in your project root with the following content:
OXYLABS_AI_STUDIO_API_KEY=your_api_key_here
Installation and Usage Flow
graph LR
A[Install SDK<br>pip install oxylabs-ai-studio] --> B[Create .env file]
B --> C[Set OXYLABS_AI_STUDIO_API_KEY]
C --> D[Import applications]
D --> E[Initialize with or without api_key]
E --> F[Make API requests]Configuration Best Practices
Development Environment
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
# Option 1: Use .env file
scraper = AiScraper() # Automatically reads from environment
# Option 2: Explicit API key
scraper = AiScraper(api_key="your_dev_key")
Production Environment
In production deployments, use environment variables directly:
export OXYLABS_AI_STUDIO_API_KEY="your_production_key"
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper() # Uses production API key from environment
Security Considerations
- Never commit API keys to version control
- Use environment variables for production deployments
- Use
.gitignoreto exclude.envfiles - Rotate API keys periodically through the Oxylabs dashboard
Module Exports
The SDK exports the settings object for direct access when needed:
from oxylabs_ai_studio import settings
print(settings.OXYLABS_AI_STUDIO_API_URL)
Sources: src/oxylabs_ai_studio/__init__.py
Error Handling and Logging
Related topics: Client Architecture, Configuration and Settings
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Client Architecture, Configuration and Settings
Error Handling and Logging
Overview
The oxylabs-ai-studio-py SDK implements a comprehensive error handling and logging system that enables developers to monitor SDK operations, debug issues, and gracefully handle failures. The system is designed with simplicity in mind while providing sufficient observability for production environments.
The logging infrastructure uses Python's standard logging module with a package-scoped namespace, ensuring all SDK components can be monitored uniformly. Error handling follows a polling-based pattern for asynchronous operations, with explicit timeout management and user cancellation support.
Logging Architecture
Logger Configuration
The SDK defines a centralized logging configuration through the logger.py module.
LOGGER_NAME = "oxylabs_ai_studio"
DEFAULT_LOG_LEVEL = logging.INFO
Sources: src/oxylabs_ai_studio/logger.py:1-14
Logger Initialization
The SDK automatically configures logging upon module import using a module-level initialization pattern:
_default_logger = logging.getLogger(LOGGER_NAME)
if not _default_logger.handlers:
configure_logging()
Sources: src/oxylabs_ai_studio/logger.py:49-52
Default Log Format
| Component | Value |
|---|---|
| Timestamp | %(asctime)s |
| Logger Name | %(name)s |
| Log Level | %(levelname)s |
| Message | %(message)s |
The default format string produces output like: 2024-01-15 10:30:45,123 - oxylabs_ai_studio - INFO - Starting scrape operation
Core Logging Functions
get_logger()
Returns a logger instance for the SDK. Child loggers automatically inherit the parent's configuration.
def get_logger(name: str | None = None) -> logging.Logger:
if name is None:
logger_name = LOGGER_NAME
elif not name.startswith(LOGGER_NAME):
logger_name = f"{LOGGER_NAME}.{name}"
else:
logger_name = name
logger = logging.getLogger(logger_name)
if logger_name != LOGGER_NAME:
logger.handlers.clear()
logger.propagate = True
return logger
Sources: src/oxylabs_ai_studio/logger.py:16-32
configure_logging()
Configures the root SDK logger with custom settings.
def configure_logging(
level: int = DEFAULT_LOG_LEVEL,
format_string: str | None = None,
handler: logging.Handler | None = None,
) -> None:
logger = logging.getLogger(LOGGER_NAME)
for existing_handler in logger.handlers[:]:
logger.removeHandler(existing_handler)
logger.setLevel(level)
if handler is None:
handler = logging.StreamHandler(sys.stderr)
if format_string is None:
format_string = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
formatter = logging.Formatter(format_string)
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.propagate = False
Sources: src/oxylabs_ai_studio/logger.py:35-48
Configuration Parameters
| Parameter | Type | Default | Description | |
|---|---|---|---|---|
level | int | logging.INFO | Minimum log level to record | |
format_string | `str \ | None` | "%(asctime)s - %(name)s - %(levelname)s - %(message)s" | Custom format pattern |
handler | `logging.Handler \ | None` | StreamHandler(sys.stderr) | Output handler |
Error Handling Patterns
Polling-Based Job Status Handling
All async operations in the SDK follow a consistent polling pattern to check job completion status.
graph TD
A[Start Job Request] --> B[Submit to API]
B --> C[Get run_id]
C --> D{Polling Loop}
D -->|HTTP 202| E[Wait POLL_INTERVAL_SECONDS]
E --> D
D -->|HTTP 200| F{Check Status}
F -->|completed| G[Return Success Data]
F -->|failed| H[Return Failure with Error Code]
F -->|processing| E
D -->|timeout| I[Raise TimeoutError]
D -->|KeyboardInterrupt| J[Log Cancellation & Raise]Timeout Management
Each application module defines its own timeout threshold and polling configuration.
| Application | Timeout (seconds) | Poll Interval (seconds) | Max Attempts |
|---|---|---|---|
| Browser Agent | 600 (10 min) | 5 | 120 |
| AI Crawler | 600 (10 min) | 5 | 120 |
| AI Scraper | 600 (10 min) | 5 | 120 |
| AI Search | 600 (10 min) | 5 | 120 |
| AI Map | 600 (10 min) | 5 | 120 |
Sources: - src/oxylabs_ai_studio/apps/browser_agent.py:1-15
- src/oxylabs_ai_studio/apps/ai_crawler.py:1-25
- src/oxylabs_ai_studio/apps/ai_scraper.py
- src/oxylabs_ai_studio/apps/ai_search.py
- src/oxylabs_ai_studio/apps/ai_map.py
Status Response Handling
The API returns standardized status responses that the SDK interprets:
if resp_body["status"] == "completed":
return JobResult(run_id=run_id, data=resp_body["data"])
if resp_body["status"] == "failed":
return JobResult(run_id=run_id, message=resp_body.get("error_code"), data=None)
API Error Responses
| HTTP Status | Meaning | SDK Action |
|---|---|---|
| 200 | Success | Process response body |
| 202 | Accepted, still processing | Continue polling |
| 4xx | Client error | Raise Exception with response text |
| 5xx | Server error | Raise Exception with response text |
Job Result Models
Common Response Structure
All job results follow a consistent Pydantic model structure:
class AiScraperJob(BaseModel):
run_id: str
message: str | None = None
data: str | dict | None
class BrowserAgentJob(BaseModel):
run_id: str
message: str | None = None
data: DataModel | None = None
class AiSearchJob(BaseModel):
run_id: str
message: str | None = None
data: resp_body["data"]
class AiCrawlerJob(BaseModel):
run_id: str
message: str | None = None
data: list[dict[str, Any]] | list[str] | None = None
Sources: - src/oxylabs_ai_studio/apps/ai_scraper.py
- src/oxylabs_ai_studio/apps/browser_agent.py:28-35
- src/oxylabs_ai_studio/apps/ai_search.py
- src/oxylabs_ai_studio/apps/ai_crawler.py:20-24
DataModel for Browser Agent
class DataModel(BaseModel):
type: Literal["json", "markdown", "html", "screenshot", "csv", "toon"]
content: dict[str, Any] | str | None
Exception Handling
ValueError Exceptions
The SDK validates input parameters and raises ValueError for missing required fields:
if output_format in ["json", "csv", "toon"] and schema is None:
raise ValueError(
"openapi_schema is required when output_format is json, csv or toon.",
)
Sources: src/oxylabs_ai_studio/apps/browser_agent.py:50-54
Schema Generation Errors
if response.status_code != 200:
raise Exception(f"Failed to generate schema: {response.text}")
Timeout Errors
raise TimeoutError(f"Failed to scrape {url}: timeout.")
raise TimeoutError(f"Failed to search {query=}")
raise TimeoutError(f"Failed to crawl {url}: timeout.")
raise TimeoutError(f"Failed to map {url}: timeout.")
API Call Errors
if status_code != 200:
raise Exception(f"Failed to perform instant search: `{response.text}`")
User Cancellation
The SDK gracefully handles KeyboardInterrupt exceptions:
except KeyboardInterrupt:
logger.info("[Cancelled] Scraping was cancelled by user.")
raise KeyboardInterrupt from None
| Exception Type | Trigger | User Message |
|---|---|---|
ValueError | Missing required parameter | Parameter-specific message |
Exception | API returns non-200 status | API response text |
TimeoutError | Job exceeds timeout threshold | Operation-specific timeout message |
KeyboardInterrupt | User cancels operation | "[Cancelled] {operation} was cancelled by user." |
Logging Usage Examples
Basic Logger Usage
from oxylabs_ai_studio.logger import get_logger
logger = get_logger(__name__) # Creates logger for current module
logger.info("Starting operation")
logger.warning("Potential issue detected")
logger.error("Operation failed")
Custom Logging Configuration
from oxylabs_ai_studio.logger import configure_logging
import logging
# Set DEBUG level with custom format
configure_logging(
level=logging.DEBUG,
format_string="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
Module-Specific Logging
from oxylabs_ai_studio.logger import get_logger
# For SDK internal modules
browser_logger = get_logger("browser_agent")
scraper_logger = get_logger("ai_scraper")
# Child loggers propagate to parent
browser_logger.info("Browser agent started") # Logged as "oxylabs_ai_studio.browser_agent"
Best Practices
1. Configure Logging Early
Set up logging configuration before initializing SDK clients:
from oxylabs_ai_studio.logger import configure_logging
import logging
configure_logging(level=logging.DEBUG)
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="your_key")
2. Handle Timeouts Appropriately
Wrap SDK calls in try-except blocks:
from oxylabs_ai_studio.apps.ai_scraper import AiScraper
scraper = AiScraper(api_key="your_key")
try:
result = scraper.scrape(url="https://example.com", output_format="markdown")
except TimeoutError as e:
logger.error(f"Scraping timed out: {e}")
except Exception as e:
logger.error(f"Scraping failed: {e}")
3. Check Job Status for Errors
Always verify the result's message field:
result = scraper.scrape(url="https://example.com")
if result.message:
logger.warning(f"Job completed with message: {result.message}")
if result.data is None:
logger.error("Job failed - no data returned")
4. Handle User Cancellation
Gracefully handle keyboard interrupts:
import logging
from oxylabs_ai_studio.apps.ai_crawler import AiCrawler
logger = get_logger(__name__)
crawler = AiCrawler(api_key="your_key")
try:
result = crawler.crawl(url="https://example.com", user_prompt="Extract data")
except KeyboardInterrupt:
logger.info("Crawl operation was cancelled by user")
# Perform cleanup if needed
Summary
The oxylabs-ai-studio-py SDK provides a unified logging and error handling system that:
- Uses Python's standard
loggingmodule with package-scoped namespaces - Configures logging automatically on module import
- Supports custom log levels, formats, and handlers
- Implements polling-based async operation handling with configurable timeouts
- Returns consistent Pydantic model responses with status information
- Provides user-friendly error messages and cancellation handling
- Follows a single pattern across all application modules for predictability
Sources: src/oxylabs_ai_studio/logger.py:1-14
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
The project should not be treated as fully validated until this signal is reviewed.
The project should not be treated as fully validated until this signal is reviewed.
Users cannot judge support quality until recent activity, releases, and issue response are checked.
Users cannot judge support quality until recent activity, releases, and issue response are checked.
Doramagic Pitfall Log
Doramagic extracted 8 source-linked risk signals. Review them before installing or handing real data to the project.
1. Project risk: Project risk needs validation
- Severity: medium
- Finding: Project risk is backed by a source signal: Project risk needs validation. Treat it as a review item until the current version is checked.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: identity.distribution | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | repo=oxylabs-ai-studio-py; install=oxylabs-ai-studio
2. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | README/documentation is current enough for a first validation pass.
3. Maintenance risk: v.0.2.19
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: v.0.2.19. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/oxylabs/oxylabs-ai-studio-py/releases/tag/v0.2.19
4. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | last_activity_observed missing
5. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | no_demo; severity=medium
6. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | no_demo; severity=medium
7. Maintenance risk: issue_or_pr_quality=unknown
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | issue_or_pr_quality=unknown
8. Maintenance risk: release_recency=unknown
- Severity: low
- Finding: release_recency=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:1003630893 | https://github.com/oxylabs/oxylabs-ai-studio-py | release_recency=unknown
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using oxylabs-ai-studio-py with real data or production workflows.
- v.0.2.19 - github / github_release
- Project risk needs validation - GitHub / issue
Source: Project Pack community evidence and pitfall evidence