# https://github.com/tinyfish-io/agentql Project Manual

Generated at: 2026-05-30 22:56:40 UTC

## Table of Contents

- [Introduction to AgentQL](#page-introduction)
- [Quick Start Guide](#page-quickstart)
- [Python SDK](#page-python-sdk)
- [JavaScript SDK](#page-javascript-sdk)
- [REST API](#page-rest-api)
- [AgentQL Query Language](#page-query-language)
- [Query Examples and Patterns](#page-query-examples)
- [Browser Modes and Configuration](#page-browser-modes)
- [Data Collection Patterns](#page-data-collection)
- [Integrations and Framework Connections](#page-integrations)

<a id='page-introduction'></a>

## Introduction to AgentQL

### Related Pages

Related topics: [Quick Start Guide](#page-quickstart), [Python SDK](#page-python-sdk)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/tinyfish-io/agentql/blob/main/README.md)
- [examples/python/first_steps/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)
- [examples/python/list_query_usage/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/README.md)
- [examples/js/first-steps/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/first-steps/README.md)
- [examples/js/collect-paginated-news-headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-news-headlines/README.md)
- [examples/js/submit-form/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/submit-form/README.md)
- [examples/js/collect-pricing-data/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-pricing-data/README.md)
- [examples/js/collect-youtube-comments/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-youtube-comments/README.md)
</details>

# Introduction to AgentQL

AgentQL is an open-source framework that connects Large Language Models (LLMs) and AI agents to the web through a natural language query language. It enables developers to extract structured data, automate web interactions, and build web scraping solutions using intuitive queries that remain resilient to UI changes over time.

## Overview

AgentQL addresses a fundamental challenge in web automation: traditional selectors (CSS, XPath) are brittle and break when web pages change. AgentQL uses natural language queries to locate elements and extract data, making automation scripts more maintainable and adaptable.

The framework integrates seamlessly with Playwright, supporting both Python and JavaScript environments. It works on any webpage—public sites, private pages, URLs behind authentication—regardless of the site's structure or technology.

Source: [README.md]()

## Core Features

| Feature | Description |
|---------|-------------|
| **Natural Language Selectors** | Find elements and data using intuitive queries based on page content |
| **Structured Output** | Define data shapes within queries for consistent structured results |
| **Cross-Site Compatibility** | Use the same query across different sites with similar content |
| **Transforms and Extracts** | Apply data transformations directly within queries |
| **Resilience to UI Changes** | Queries self-heal as page structures evolve |
| **Works on Any Page** | Public, private, authenticated—any URL |

Source: [README.md]()

## Architecture

AgentQL follows a client-side wrapper pattern where the AgentQL SDK wraps Playwright's page objects to extend their functionality with query capabilities.

```mermaid
graph TD
    A[Developer] -->|Writes AgentQL Query| B[AgentQL SDK]
    B -->|Wraps| C[Playwright Page Object]
    C -->|Interacts with| D[Web Page]
    D -->|Returns DOM| C
    C -->|Processes| B
    B -->|Structured JSON| A
    
    E[LLM Backend] <-->|Natural Language Processing| B
```

### Query Methods

The SDK provides two primary API methods for interacting with web pages:

| Method | Purpose | Use Case |
|--------|---------|----------|
| `query_elements()` | Locate DOM elements | Automation, clicking, typing |
| `query_data()` | Extract structured data | Scraping, data collection |
| `get_by_prompt()` | Natural language element lookup | Finding elements by description |

Source: [examples/python/first_steps/main.py:1-80]()

## SDKs and Tools

AgentQL provides multiple entry points for different development environments:

### Python SDK

The Python SDK integrates with Playwright's synchronous API for automation and scraping scripts.

```python
import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright

with sync_playwright() as playwright:
    page = agentql.wrap(browser.new_page())
    response = page.query_elements(SEARCH_BOX_QUERY)
    data = page.query_data(PRODUCT_DATA_QUERY)
```

Installation: `pip install agentql`  
Documentation: [Python SDK Installation](https://docs.agentql.com/python-sdk/installation)

Source: [examples/python/first_steps/main.py:1-16]()

### JavaScript SDK

The JavaScript SDK works with Playwright for Node.js environments.

```javascript
import { chromium } from '@playwright/test';
import { wrap, query } from 'agentql';

async function main() {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  const wrappedPage = wrap(page);
  // Use wrappedPage.query_elements() and wrappedPage.query_data()
}
```

Installation: Available via npm  
Documentation: [JavaScript SDK Installation](https://docs.agentql.com/javascript-sdk/installation)

Source: [examples/js/first-steps/README.md]()

### REST API

Execute AgentQL queries without installing an SDK via the REST API endpoint.

Documentation: [REST API Reference](https://docs.agentql.com/rest-api/api-reference)

Source: [README.md]()

### Additional Tools

| Tool | Purpose |
|------|---------|
| **Debugger Chrome Extension** | Debug and refine queries in real-time on live sites |
| **Playground** | Interactive environment to test queries and export scripts |
| **AgentQL Query Language** | Define queries with natural language syntax |
| **MCP Server** | Integration for agent frameworks |
| **LangChain Integration** | Connect with LangChain for agentic workflows |

Source: [README.md]()

## Query Language

AgentQL queries use a GraphQL-like syntax to define what elements to find and what data to extract.

### Basic Element Query

```graphql
{
    search_product_box
    submit_button
    results_container
}
```

Source: [examples/python/first_steps/main.py:23-30]()

### Data Extraction Query

```graphql
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
```

The `[]` notation queries lists of items, and type annotations like `(integer)` apply transformations to extracted values.

Source: [examples/python/first_steps/main.py:32-39]()

### Natural Language Prompt

For element location, you can use free-form natural language prompts:

```python
NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)
```

This approach finds elements based on semantic understanding rather than structural selectors.

Source: [examples/python/first_steps/main.py:42-47]()

## Common Use Cases

### Collecting List Data

Extract multiple items from a page, such as product listings or search results:

```python
PRODUCT_DATA_QUERY = """
{
    products[] {
        name
        price
        link
    }
}
"""
data = page.query_data(PRODUCT_DATA_QUERY)
```

Source: [examples/python/list_query_usage/README.md]()

### Handling Pagination

Step through multiple pages to collect large datasets:

```javascript
// Collect HackerNews headlines across paginated pages
async function collectHeadlines(url, numPages) {
  const headlines = [];
  for (let i = 0; i < numPages; i++) {
    await page.goto(url + `?p=${i + 1}`);
    const data = await page.queryData(HEADLINES_QUERY);
    headlines.push(...data.headlines);
  }
  return headlines;
}
```

Source: [examples/js/collect-paginated-news-headlines/README.md]()

### Form Automation

Fill out and submit forms using natural language queries:

```javascript
const FORM_QUERY = `
{
    username_field
    password_field
    submit_button
}
`;
const form = await page.queryElements(FORM_QUERY);
await form.username_field.fill('user@example.com');
await form.submit_button.click();
```

Source: [examples/js/submit-form/README.md]()

### E-commerce Data Collection

Extract pricing and product information from online stores:

```python
PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""
response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type(search_key_word, delay=200)
page.keyboard.press("Enter")
data = page.query_data(PRODUCT_DATA_QUERY)
```

Source: [examples/python/first_steps/main.py:31-60]()

### Waiting for Page Load

Ensure pages fully load before querying:

```javascript
await page.goto(url);
// Wait for network idle and dynamic content
await page.waitForLoadState('networkidle');
const data = await page.queryData(DATA_QUERY);
```

Source: [examples/js/wait-for-entire-page-load/README.md]()

## Integration Patterns

### With AI Agents

AgentQL is designed for AI agent workflows. The framework allows agents to:

1. Navigate to any URL
2. Query elements using natural language
3. Extract structured data
4. Perform actions (click, type, scroll)

```mermaid
graph LR
    A[AI Agent] -->|Instruction| B[AgentQL SDK]
    B -->|Query| C[Web Page]
    C -->|Data| D[Structured Output]
    D -->|Analysis| A
    A -->|Action| B
```

Source: [README.md]()

### Cloudflare Workers Consideration

Users have explored using AgentQL with Cloudflare's Browser Rendering for edge environments. However, edge environments may have limitations with certain Node.js APIs that AgentQL depends on. See [Issue #128](https://github.com/tinyfish-io/agentql/issues/128) for community discussion on this integration pattern.

Source: [Community Issue #128]()

## Getting Started

### Prerequisites

- Python 3.8+ or Node.js 18+
- Playwright installed

### Installation

**Python:**
```bash
pip install agentql
playwright install chromium
```

**JavaScript:**
```bash
npm install agentql
npx playwright install chromium
```

### Quick Start Steps

1. Install the AgentQL SDK for your language
2. Launch a browser with Playwright
3. Wrap the page object with `agentql.wrap()`
4. Write your first AgentQL query
5. Use `query_elements()` for actions or `query_data()` for extraction
6. Optional: Install the [AgentQL Debugger Chrome Extension](https://chromewebstore.google.com/detail/agentql-debugger/idnejmodeepdobpinkkgpkeabkabhhej) to test queries on live sites

### Testing Your Queries

The AgentQL Playground at [playground.agentql.com](https://playground.agentql.com/) allows you to:
- Test queries on live websites
- Export working Python/JavaScript scripts
- Optimize query patterns

Source: [README.md]()

## Community Resources

| Resource | Link |
|----------|------|
| Documentation | [docs.agentql.com](https://docs.agentql.com) |
| Discord Community | [discord.gg/agentql](https://discord.gg/agentql) |
| X (Twitter) | [@agentql](https://twitter.com/agentql) |
| LinkedIn | [tinyfish-ai](https://www.linkedin.com/company/tinyfish-ai) |
| Deep-dive Article | [Starlog Analysis](https://starlog.is/articles/automation/tinyfish-io-agentql) |

## Known Limitations

- Element resolution may occasionally return generic containers instead of specific elements (see [Issue #121](https://github.com/tinyfish-io/agentql/issues/121))
- Edge environment compatibility requires additional configuration for Cloudflare Workers ([Issue #128](https://github.com/tinyfish-io/agentql/issues/128))

## Summary

AgentQL bridges the gap between LLMs and web automation by providing a natural language query interface that abstracts away brittle CSS/XPath selectors. Its dual Python and JavaScript SDKs integrate with Playwright, making it accessible for both backend automation scripts and modern web agent frameworks. The structured output capability, combined with transforms and cross-site compatibility, makes AgentQL a robust choice for building maintainable web scraping and automation solutions.

---

<a id='page-quickstart'></a>

## Quick Start Guide

### Related Pages

Related topics: [Python SDK](#page-python-sdk), [JavaScript SDK](#page-javascript-sdk)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/python/first_steps/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)
- [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)
- [examples/python/news-aggregator/main_sync.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main_sync.py)
- [examples/python/collect_paginated_news_headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/README.md)
- [examples/js/collect-paginated-news-headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-news-headlines/README.md)
</details>

# Quick Start Guide

AgentQL is a query language and SDK designed to connect LLMs and AI agents to the web. This guide provides everything you need to start using AgentQL within 5 minutes, whether you're using Python or JavaScript.

## Prerequisites

Before beginning, ensure you have the following installed:

| Requirement | Version | Purpose |
|-------------|---------|---------|
| Python | 3.8+ | For Python SDK usage |
| Node.js | 18+ | For JavaScript SDK usage |
| Playwright | Latest | Browser automation |
| AgentQL SDK | Latest | Core library |

### Python SDK Installation

Install the AgentQL Python SDK using pip:

```bash
pip install agentql
```

Install Playwright with the required browsers:

```bash
pip install playwright
playwright install chromium
```

### JavaScript SDK Installation

Install the AgentQL JavaScript SDK using npm:

```bash
npm install agentql
npx playwright install chromium
```

## Core Concepts

Understanding these fundamental concepts will help you write effective AgentQL queries:

### AgentQL Query Language

AgentQL uses a JSON-like query syntax to describe what data to extract or what elements to interact with on a web page. Queries are written in natural language-like format, making them intuitive and self-documenting.

```graphql
{
    search_product_box
    products[] {
        name
        price(integer)
    }
}
```

Source: [examples/python/first_steps/main.py:29-36]()

### Smart Locator vs Data Query API

AgentQL provides two distinct APIs:

| API Type | Method | Purpose |
|----------|--------|---------|
| Smart Locator | `query_elements()` | Locate elements for interaction |
| Data Query | `query_data()` | Extract structured data |

## Your First Script

### Python Quick Start

Create a new file named `main.py` and add the following code:

```python
#!/usr/bin/env python3
import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright

URL = "https://scrapeme.live/shop"

# Query to locate the search box element
SEARCH_BOX_QUERY = """
{
    search_product_box
}
"""

# Query for data extraction
PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""

def main():
    with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
        page = agentql.wrap(browser.new_page())
        page.goto(URL)
        
        product_data = page.query_data(PRODUCT_DATA_QUERY)
        print(product_data)

if __name__ == "__main__":
    main()
```

Source: [examples/python/first_steps/main.py:1-45]()

Run the script:

```bash
python3 main.py
```

### JavaScript Quick Start

Create a new file named `main.js`:

```javascript
const agentql = require('agentql');
const { chromium } = require('playwright');

const URL = "https://scrapeme.live/shop";

const PRODUCT_QUERY = `
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
`;

async function main() {
    const browser = await chromium.launch({ headless: false });
    const page = await agentql.wrapAsync(browser.newPage());
    
    await page.goto(URL);
    const productData = await page.queryData(PRODUCT_QUERY);
    console.log(productData);
    
    await browser.close();
}

main();
```

Source: [examples/js/collect-paginated-news-headlines/README.md:18-36]()

Run the script:

```bash
node main.js
```

## Workflow Overview

```mermaid
graph TD
    A[Install AgentQL SDK] --> B[Import AgentQL Library]
    B --> C[Launch Browser with Playwright]
    C --> D[Wrap Page with AgentQL]
    D --> E[Write AgentQL Query]
    E --> F[Execute Query]
    F --> G[Process Results]
    G --> H[Close Browser]
```

## Common Usage Patterns

### Extracting Paginated Data

To collect data across multiple pages, use a loop with navigation:

```python
import agentql
from playwright.sync_api import sync_playwright

async def collect_paginated_news():
    async with sync_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await agentql.wrap_async(browser.new_page())
        
        all_items = []
        for page_num in range(3):  # Collect 3 pages
            await page.goto(f"https://news.ycombinator.com?p={page_num + 1}")
            data = await page.query_data(QUERY)
            all_items.extend(data.get("items", []))
        
        await browser.close()
        return all_items
```

Source: [examples/python/collect_paginated_news_headlines/README.md:1-22]()

### Multi-URL Data Collection

Fetch data from multiple websites concurrently using async patterns:

```python
import asyncio
import agentql
from agentql.ext.playwright.async_api import Page
from playwright.async_api import async_playwright

WEBSITE_URLS = [
    "https://duckduckgo.com/?q=agents+for+the+web&t=h_&iar=news&ia=news",
]

async def main():
    async with async_playwright() as p:
        async with await p.chromium.launch(headless=True) as browser:
            async with await browser.new_context() as context:
                await asyncio.gather(
                    *(fetch_data(context, url) for url in WEBSITE_URLS)
                )

async def fetch_data(context, session_url):
    page = await agentql.wrap_async(await context.new_page())
    await page.goto(session_url)
    data = await page.query_data(QUERY)
    return data
```

Source: [examples/python/news-aggregator/main.py:17-36]()

### Synchronous vs Asynchronous Execution

AgentQL supports both synchronous and asynchronous patterns:

| Pattern | Use Case | API |
|---------|----------|-----|
| Synchronous | Simple scripts, sequential operations | `agentql.wrap()` |
| Asynchronous | Concurrent operations, better performance | `agentql.wrap_async()` |

**Synchronous example:**

```python
from playwright.sync_api import sync_playwright

def main():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = agentql.wrap(browser.new_page())
        page.goto(URL)
        data = page.query_data(QUERY)
        browser.close()
```

Source: [examples/python/news-aggregator/main_sync.py:17-27]()

## Running Examples in Google Colab

You can run AgentQL examples directly in Google Colab without local installation:

1. Navigate to the [Google Colab example](https://github.com/tinyfish-io/agentql/tree/main/examples/python/run_script_online_in_google_colab)
2. Open `main.ipynb` in Colab
3. Run cells sequentially

This approach is useful for quick experimentation without setting up a local environment.

## Writing Effective Queries

### Querying Lists

Use array syntax `[]` to query multiple elements:

```graphql
{
    products[] {
        name
        price
        description
    }
}
```

### Data Type Transformations

Apply type conversions within queries:

```graphql
{
    products[] {
        name
        price(integer)  # Convert to integer
        rating(float)   # Convert to float
    }
}
```

Source: [examples/python/first_steps/main.py:34-36]()

### Natural Language Prompts

For element location, use natural language prompts:

```python
NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
```

This allows flexible element selection based on descriptive intent rather than CSS selectors.

## Troubleshooting Common Issues

### Element Resolution Problems

If elements resolve as "useless span" or fail to locate expected elements:

- Verify the URL matches the expected page structure
- Use the [AgentQL Debugger Chrome extension](https://docs.agentql.com/installation/chrome-extension-installation) to test queries
- Check that the page has fully loaded before querying

Source: [issues/tinyfish-io/agentql#121](https://github.com/tinyfish-io/agentql/issues/121)

### Cloudflare Browser Rendering

When using AgentQL with Cloudflare's Browser Rendering:

- Edge environments may have Node.js API limitations
- Some synchronous Playwright APIs may not be available
- Consider using async patterns for edge compatibility

Source: [issues/tinyfish-io/agentql#128](https://github.com/tinyfish-io/agentql/issues/128)

## Next Steps

After completing this quick start guide:

| Resource | Description |
|----------|-------------|
| [AgentQL Query Language](https://docs.agentql.com/agentql-query/query-intro) | Deep dive into query syntax |
| [Python SDK Reference](https://docs.agentql.com/python-sdk/installation) | Complete API documentation |
| [JavaScript SDK Reference](https://docs.agentql.com/javascript-sdk/installation) | JS API documentation |
| [Examples Repository](https://github.com/tinyfish-io/agentql/tree/main/examples) | Full example collection |
| [Discord Community](https://discord.gg/agentql) | Get help and share feedback |

## Key Takeaways

1. **Installation is straightforward** - A single package install gets you started
2. **Two API modes** - Choose sync for simplicity or async for performance
3. **Natural language queries** - Write queries that describe intent, not selectors
4. **Structured output** - Data returns in the shape you define in your query
5. **Cross-site compatibility** - Queries work across similar sites with comparable content

Get started in 5 minutes by running the example scripts above, then explore the [official documentation](https://docs.agentql.com) for advanced features and integrations.

---

<a id='page-python-sdk'></a>

## Python SDK

### Related Pages

Related topics: [JavaScript SDK](#page-javascript-sdk), [Browser Modes and Configuration](#page-browser-modes)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/python/first_steps/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)
- [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)
- [examples/python/news-aggregator/main_sync.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main_sync.py)
- [examples/python/list_query_usage/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/main.py)
- [examples/python/get_by_prompt/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/get_by_prompt/main.py)
- [examples/python/compare_product_prices/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/compare_product_prices/main.py)
- [examples/python/maps_scraper/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/maps_scraper/main.py)
- [examples/python/collect_paginated_news_headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/README.md)
- [README.md](https://github.com/tinyfish-io/agentql/blob/main/README.md)
- [golden-images.yaml](https://github.com/tinyfish-io/agentql/blob/main/golden-images.yaml)

</details>

# Python SDK

The AgentQL Python SDK provides a powerful interface for connecting LLMs and AI agents to the web through structured data queries and intelligent element location. Built as a wrapper around Microsoft Playwright, the SDK enables developers to extract structured data, interact with web elements, and automate browser workflows using AgentQL's query language and natural language prompts.

## Overview

The Python SDK serves as the primary programming interface for Python developers building web automation, data extraction, and AI agent applications. It wraps Playwright's Page objects to provide AgentQL-specific querying capabilities while maintaining full access to Playwright's browser automation features.

### Key Capabilities

| Capability | Description |
|------------|-------------|
| Structured Data Extraction | Query web pages using AgentQL's query language to extract typed, structured data |
| Natural Language Element Selection | Locate elements using intuitive prompts instead of CSS selectors |
| Cross-Site Compatibility | Write queries once and use them across similar websites |
| Dual API Support | Available in both synchronous and asynchronous implementations |
| Playwright Integration | Full access to Playwright's browser automation features |

Source: [README.md:1-15](https://github.com/tinyfish-io/agentql/blob/main/README.md)

## Installation

### Prerequisites

- Python 3.12 or later (Python 3.13 recommended)
- Playwright browser binaries installed

### Installation via pip

```bash
pip install agentql
```

### Browser Binary Setup

After installing the SDK, initialize Playwright browsers:

```bash
playwright install chromium
```

The SDK is tested and recommended with Python 3.13 running on Debian 12 (Bookworm) slim base image, with Playwright v1.58.2 on Ubuntu 24.04 LTS.

Source: [golden-images.yaml:1-30](https://github.com/tinyfish-io/agentql/blob/main/golden-images.yaml)

## Core API Methods

### Wrapping a Page Object

To access AgentQL's querying capabilities, wrap a Playwright page object using `agentql.wrap()`:

```python
import agentql
from playwright.sync_api import sync_playwright

with sync_playwright() as playwright:
    browser = playwright.chromium.launch(headless=True)
    page = agentql.wrap(browser.new_page())
```

Source: [examples/python/first_steps/main.py:35-39](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)

### query_data()

Extracts structured data from the page using an AgentQL query. Returns a dictionary matching the query structure.

```python
PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""

data = page.query_data(PRODUCT_DATA_QUERY)
print(data)
```

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| query | str | AgentQL query defining the data structure to extract |
| timeout | int | Maximum wait time in milliseconds (default: 30000) |

**Returns:** Dictionary with keys matching the query fields

Source: [examples/python/first_steps/main.py:30-34](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)

### query_elements()

Locates DOM elements matching an AgentQL query, returning element references that can be interacted with using Playwright's API.

```python
SEARCH_BOX_QUERY = """
{
    search_product_box
}
"""

response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type("fish", delay=200)
page.keyboard.press("Enter")
```

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| query | str | AgentQL query defining elements to locate |
| timeout | int | Maximum wait time in milliseconds (default: 30000) |

**Returns:** Object with attributes matching query field names, containing Playwright Locator objects

Source: [examples/python/first_steps/main.py:52-59](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)

### get_by_prompt()

Locates elements using natural language prompts. This method uses AI to find elements based on their semantic meaning rather than DOM structure.

```python
# Locate the search bar using natural language
search_bar = page.get_by_prompt("the search bar")
search_bar.fill("AgentQL")

# Click a button using a description
page.get_by_prompt("the search button").click()
```

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| prompt | str | Natural language description of the element |
| timeout | int | Maximum wait time in milliseconds (default: 30000) |

**Returns:** Playwright Locator object for the matched element, or None if not found

Source: [examples/python/get_by_prompt/main.py:18-26](https://github.com/tinyfish-io/agentql/blob/main/examples/python/get_by_prompt/main.py)

## Asynchronous API

For applications requiring concurrent operations, use the async API with `async_playwright` and `agentql.wrap_async()`:

```python
import asyncio
import agentql
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        async with await browser.new_context() as context:
            page = await agentql.wrap_async(context.new_page())
            await page.goto("https://example.com")
            data = await page.query_data(QUERY)
```

Source: [examples/python/news-aggregator/main.py:28-38](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)

### Concurrent Page Operations

The async API enables concurrent data fetching from multiple pages:

```python
async def main():
    async with async_playwright() as p, await p.chromium.launch(headless=True) as browser:
        async with await browser.new_context() as context:
            await asyncio.gather(
                *(fetch_data(context, url) for url in WEBSITE_URLS)
            )

async def fetch_data(context, url):
    page = await agentql.wrap_async(context.new_page())
    await page.goto(url)
    data = await page.query_data(QUERY)
```

Source: [examples/python/news-aggregator/main.py:28-44](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)

## Common Usage Patterns

### E-commerce Data Extraction

Extract product information from e-commerce websites:

```python
QUERY = """
{
    products[]
    {
        name
        price(integer)
    }
}
"""

page.goto("https://scrapeme.live/shop")
response = page.query_data(QUERY)

# Write to CSV
with open("product_data.csv", "w", encoding="utf-8") as file:
    file.write("Name, Price\n")
    for product in response["products"]:
        file.write(f"{product['name']},{product['price']}\n")
```

Source: [examples/python/list_query_usage/main.py:14-30](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/main.py)

### Multi-Site Price Comparison

Compare product prices across different websites using the same query:

```python
PRODUCT_INFO_QUERY = """
{
    nintendo_switch_price
}
"""

page.goto(NINTENDO_URL)
response = page.query_data(PRODUCT_INFO_QUERY)
print("Price at Nintendo: ", response["nintendo_switch_price"])

page.goto(TARGET_URL)
response = page.query_data(PRODUCT_INFO_QUERY)
print("Price at Target: ", response["nintendo_switch_price"])
```

Source: [examples/python/compare_product_prices/main.py:20-31](https://github.com/tinyfish-io/agentql/blob/main/examples/python/compare_product_prices/main.py)

### List Data Extraction

Query lists of items on a page:

```python
QUERY = """
{
    listings[]
    {
        name
        rating
        description
        order_link
        take_out_link
        address
        hours
    }
}
"""

response = page.query_data(QUERY)

for listing in response["listings"]:
    file.write(
        f"{listing['name']},{listing['rating']},{listing['description']}...\n"
    )
```

Source: [examples/python/maps_scraper/main.py:1-15](https://github.com/tinyfish-io/agentql/blob/main/examples/python/maps_scraper/main.py)

### Paginated Data Collection

Collect data across multiple pages by navigating through pagination:

```python
for page_num in range(num_pages):
    page.goto(f"{BASE_URL}&page={page_num}")
    data = page.query_data(QUERY)
    all_results.extend(data["items"])
```

Source: [examples/python/collect_paginated_news_headlines/README.md:1-20](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/README.md)

## Data Transformations

AgentQL queries support inline transformations to format extracted data:

```python
QUERY = """
{
    items[]{
        published_date(convert to XX/XX/XXXX format)
        entry(title or post if no title is available)
        author(person's name; return "n/a" if not available)
        outlet(the original platform it is posted on)
        url
    }
}
"""
```

The SDK supports:
- Type conversions (e.g., `price(integer)`)
- Date format transformations
- Default values for missing fields
- Conditional extraction logic

Source: [examples/python/news-aggregator/main_sync.py:20-28](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main_sync.py)

## Advanced Configuration

### Headless Mode

Run browsers in headless mode for server-side or CI environments:

```python
with sync_playwright() as playwright:
    playwright.chromium.launch(headless=True)  # Default for CI/CD
```

For debugging, disable headless mode:

```python
with sync_playwright() as playwright:
    playwright.chromium.launch(headless=False)  # Visible browser
```

### Browser Contexts

Use browser contexts to isolate sessions, cookies, and state:

```python
async with await browser.new_context() as context:
    # Each context has independent storage
    page = await agentql.wrap_async(context.new_page())
```

### Logging Configuration

Configure logging for debugging and monitoring:

```python
import logging

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

log.info("All done! CSV is here: %s", CSV_FILE_PATH)
```

Source: [examples/python/news-aggregator/main.py:14-15](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)

## Architecture

```mermaid
graph TD
    A[Python Application] --> B[AgentQL SDK]
    B --> C[Playwright API]
    C --> D[Browser Instance]
    E[AgentQL Query Language] --> B
    F[Natural Language Prompts] --> B
    G[Web Page DOM] --> D
    D --> H[Structured Data Response]
    B --> H
    
    subgraph "AgentQL SDK Components"
        B
        I[query_data method]
        J[query_elements method]
        K[get_by_prompt method]
    end
    
    I --> B
    J --> B
    K --> B
```

## Relationship to JavaScript SDK

The Python SDK shares identical API patterns with the JavaScript SDK, enabling cross-language development:

| Feature | Python SDK | JavaScript SDK |
|---------|-------------|----------------|
| Wrap Page | `agentql.wrap(page)` | `agentql.wrap(page)` |
| Async Wrap | `agentql.wrap_async(page)` | `agentql.wrapAsync(page)` |
| Query Data | `page.query_data(QUERY)` | `page.queryData(QUERY)` |
| Query Elements | `page.query_elements(QUERY)` | `page.queryElements(QUERY)` |
| By Prompt | `page.get_by_prompt("text")` | `page.getByPrompt("text")` |

Both SDKs use the same AgentQL query language and provide equivalent functionality for their respective platforms.

## See Also

- [JavaScript SDK](../javascript-sdk) - For Node.js and browser environments
- [REST API](../rest-api) - Serverless query execution
- [AgentQL Query Language](../agentql-query) - Query syntax reference
- [Chrome Extension](https://docs.agentql.com/installation/chrome-extension-installation) - Debug and develop queries interactively
- [Examples Repository](https://github.com/tinyfish-io/agentql/tree/main/examples/python) - Complete working examples

---

<a id='page-javascript-sdk'></a>

## JavaScript SDK

### Related Pages

Related topics: [Python SDK](#page-python-sdk), [REST API](#page-rest-api)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/js/package.json](https://github.com/tinyfish-io/agentql/blob/main/examples/js/package.json)
- [examples/js/get-by-prompt/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/get-by-prompt/main.js)
- [examples/js/news-aggregator/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/news-aggregator/main.js)
- [examples/js/collect-pricing-data/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-pricing-data/main.js)
- [examples/js/collect-paginated-ecommerce-data/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-ecommerce-data/main.js)
- [examples/js/collect-paginated-news-headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-news-headlines/README.md)
- [examples/js/collect-paginated-ecommerce-data/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-ecommerce-data/README.md)
</details>

# JavaScript SDK

The AgentQL JavaScript SDK enables developers to build web automation and scraping applications using natural language queries. It provides a seamless integration with Playwright, allowing JavaScript and Node.js developers to leverage AgentQL's query language for extracting structured data from web pages.

## Overview

The JavaScript SDK wraps Playwright's browser automation capabilities with AgentQL's intelligent querying layer. This combination allows developers to:

- Query web pages using natural language descriptions
- Extract structured data without relying on CSS selectors or XPath
- Build resilient automation scripts that adapt to UI changes
- Execute queries across multiple browser contexts simultaneously

Source: [examples/js/package.json:1-28]()

### SDK Dependencies

| Dependency | Version | Purpose |
|------------|---------|---------|
| agentql | latest | Core SDK package |
| playwright | ^1.48.2 | Browser automation framework |
| playwright-dompath | ^0.0.7 | DOM path resolution |
| openai | ^4.70.1 | LLM integration for query processing |

Source: [examples/js/package.json:18-22]()

## Installation

### Prerequisites

- Node.js environment
- Playwright browsers installed

### Setup

```javascript
const { wrap, configure } = require('agentql');
const { chromium } = require('playwright');
```

### Configuration

Configure the SDK with your API key:

```javascript
configure({
  apiKey: process.env.AGENTQL_API_KEY, // Optional, uses default if omitted
});
```

Source: [examples/js/get-by-prompt/main.js:10-12]()

## Core API

### Wrapping a Playwright Page

The `wrap()` function transforms a standard Playwright `Page` object into an AgentQL-enabled page that supports natural language queries:

```javascript
const { wrap } = require('agentql');
const { chromium } = require('playwright');

async function main() {
  const browser = await chromium.launch({ headless: false });
  const page = await wrap(await browser.newPage());
  
  await page.goto('https://example.com');
  
  // Now page has AgentQL query capabilities
}
```

Source: [examples/js/get-by-prompt/main.js:14-17]()

### getByPrompt Method

The `getByPrompt()` method locates elements using natural language descriptions. This is the primary way to interact with page elements:

```javascript
// Locate a sign up button by describing what it does
const signUpBtn = await page.getByPrompt('Sign up button');

// Click the element if found
if (signUpBtn) {
  await signUpBtn.click();
}
```

Source: [examples/js/get-by-prompt/main.js:24-30]()

### queryData Method

The `queryData()` method extracts structured data from the page using AgentQL's query language:

```javascript
const query = `
{
    products[] {
        name
        model
        sku
        price(integer)
    }
}
`;

const data = await page.queryData(QUERY);
console.log(data.products);
```

Source: [examples/js/collect-pricing-data/main.js:12-23]()

## AgentQL Query Language

The query language uses a GraphQL-like syntax to define the structure of desired data. Queries are processed by LLMs to find matching elements on the page.

### Basic Query Structure

```javascript
const query = `
{
    items[]
    {
        published_date
        entry
        author
        outlet
        url
    }
}
`;
```

Source: [examples/js/news-aggregator/main.js:10-18]()

### List Extraction

Use the `[]` notation to query arrays of items:

```javascript
const query = `
{
    products[] {
        name
        price
    }
}
`;
```

Source: [examples/js/collect-pricing-data/main.js:12-18]()

### Data Transforms

Apply transforms within queries to modify extracted values:

```javascript
const query = `
{
    items[] {
        published_date(convert to XX/XX/XXXX format)
        entry(title or post if no title is available)
    }
}
`;
```

Source: [examples/js/news-aggregator/main.js:10-15]()

### Type Conversions

Specify data types for extracted values:

```javascript
const query = `
{
    products[] {
        name
        price(integer)
    }
}
`;
```

Source: [examples/js/collect-pricing-data/main.js:15-17]()

### Fallback Values

Handle missing data gracefully:

```javascript
const query = `
{
    items[] {
        author(person's name; return "n/a" if not available)
        outlet(the original platform it is posted on; if no platform is listed, use the root domain of the url)
    }
}
`;
```

Source: [examples/js/news-aggregator/main.js:15-18]()

## Common Use Cases

### Searching and Filtering

```javascript
async function searchProduct(page, product, minPrice, maxPrice) {
  // Find search input using natural language
  const searchInput = await page.getByPrompt('the search input field');
  if (!searchInput) {
    console.log('Search input field not found.');
    return false;
  }
  
  // Type with realistic delay
  await searchInput.type(product, { delay: 200 });
  await searchInput.press('Enter');

  // Fill price range filters
  const minPriceInput = await page.getByPrompt('the min price input field');
  if (minPriceInput) {
    await minPriceInput.fill(String(minPrice));
  }

  const maxPriceInput = await page.getByPrompt('the max price input field');
  if (maxPriceInput) {
    await maxPriceInput.fill(String(maxPrice));
    await maxPriceInput.press('Enter');
  }
  return true;
}
```

Source: [examples/js/collect-pricing-data/main.js:27-49]()

### Pagination Handling

```javascript
async function goToTheNextPage(page) {
  const nextPageQuery = `
    {
        pagination {
            prev_page
            next_page
        }
    }
  `;
  // Query and interact with pagination controls
}
```

Source: [examples/js/collect-pricing-data/main.js:53-63]()

### Multi-Tab Data Collection

```javascript
const websiteUrls = [
  'https://bsky.app/search?q=agents+for+the+web',
  'https://dev.to/search?q=agents%20for+the+web',
  'https://hn.algolia.com/?query=agents%20for+the+web',
];

async function fetchData(context, sessionUrl) {
  const page = await wrap(await context.newPage());
  await page.goto(sessionUrl);
  const data = await page.queryData(query);
  // Process extracted data
}

// Fetch from multiple URLs concurrently
await asyncio.gather(
  *(fetchData(context, url) for url in websiteUrls)
);
```

Source: [examples/js/news-aggregator/main.js:26-41]()

## Workflow Diagram

```mermaid
graph TD
    A[Initialize Browser] --> B[Wrap Page with AgentQL]
    B --> C[Configure API Key]
    C --> D[Navigate to URL]
    D --> E[Execute Query or getByPrompt]
    E --> F{Query Type?}
    F -->|Data Extraction| G[queryData returns structured JSON]
    F -->|Element Interaction| H[getByPrompt returns element]
    G --> I[Process Results]
    H --> J[Interact with Element]
    J --> K[Wait for Navigation/Update]
    K --> E
    I --> L[Close Browser]
```

## Configuration Options

### Browser Launch Options

```javascript
const browser = await chromium.launch({ 
  headless: false  // or true for headless mode
});
```

Source: [examples/js/get-by-prompt/main.js:15]()

### Browser Context Options

```javascript
const context = await browser.newContext();
// Create multiple pages within the same context for concurrent operations
const page1 = await context.newPage();
const page2 = await context.newPage();
```

Source: [examples/js/news-aggregator/main.js:27-31]()

## Development Tools

### Linting and Formatting

The SDK project includes pre-configured linting and formatting:

```bash
# Run ESLint
npm run lint

# Run Prettier
npm run format
```

Source: [examples/js/package.json:7-10]()

### Available Dev Dependencies

| Package | Version | Purpose |
|---------|---------|---------|
| eslint | ^8.57.0 | JavaScript linting |
| eslint-config-prettier | ^9.1.0 | Disables ESLint rules that conflict with Prettier |
| prettier | ^2.8.7 | Code formatting |
| @trivago/prettier-plugin-sort-imports | ^4.3.0 | Import sorting |

Source: [examples/js/package.json:11-15]()

## Security Overrides

The SDK includes dependency version overrides for security patches:

```json
"overrides": {
  "axios": "^1.15.0",
  "flatted": "^3.4.2",
  "follow-redirects": "^1.16.0",
  "lodash": "^4.18.0",
  "minimatch": "^3.1.3"
}
```

Source: [examples/js/package.json:23-28]()

## Known Limitations

### Cloudflare Browser Rendering Compatibility

There is an open issue regarding compatibility with Cloudflare's Browser Rendering in edge environments. Cloudflare Workers use a restricted Node.js runtime that may not fully support all Playwright and AgentQL features. Developers targeting Cloudflare Workers should be aware of potential limitations with browser instance access.

Source: [GitHub Issue #128](https://github.com/tinyfish-io/agentql/issues/128)

### Element Resolution Edge Cases

In some cases, elements may be resolved as generic containers (e.g., `<span>`) rather than semantic elements. This can affect element location accuracy. When encountering such issues, try using more specific prompt descriptions or combining with Playwright's native locators.

Source: [GitHub Issue #121](https://github.com/tinyfish-io/agentql/issues/121)

## Additional Resources

| Resource | Description |
|----------|-------------|
| [Installation Guide](https://docs.agentql.com/javascript-sdk/installation) | Full SDK installation instructions |
| [Query Language Docs](https://docs.agentql.com/agentql-query/query-intro) | Complete AgentQL query language reference |
| [Chrome Extension](https://docs.agentql.com/installation/chrome-extension-installation) | Debug and test queries in real-time |
| [Playground](https://playground.agentql.com/) | Interactive query testing environment |
| [Examples Directory](https://github.com/tinyfish-io/agentql/tree/main/examples/js) | Complete list of JavaScript examples |

---

<a id='page-rest-api'></a>

## REST API

### Related Pages

Related topics: [Python SDK](#page-python-sdk), [JavaScript SDK](#page-javascript-sdk)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/tinyfish-io/agentql/blob/main/README.md)
- [examples/js/package.json](https://github.com/tinyfish-io/agentql/blob/main/examples/js/package.json)
- [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)
- [examples/python/news-aggregator/main_sync.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main_sync.py)
- [examples/python/first_steps/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)
- [examples/python/list_query_usage/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/main.py)
</details>

# REST API

AgentQL provides a REST API as an alternative to the Python and JavaScript SDKs for executing queries without requiring a full SDK installation. The REST API enables developers to interact with the AgentQL query engine over HTTP, making it suitable for environments where SDK integration is not practical or for quick prototyping and testing.

## Overview

The REST API is one of three tool options provided by AgentQL alongside the Python SDK and JavaScript SDK. It allows executing queries against web pages without needing to set up Playwright or maintain a browser automation environment locally.

Source: [README.md](https://github.com/tinyfish-io/agentql/blob/main/README.md)

## Architecture

```mermaid
graph TD
    A[Client Application] -->|HTTP POST /query| B[AgentQL REST API]
    B -->|Parse & Process Query| C[Query Engine]
    C -->|DOM Analysis| D[Web Page Content]
    D -->|Extracted Data| B
    B -->|JSON Response| A
    
    E[SDK Client] -->|Internal Request| B
    B -->|Same Flow| D
```

## When to Use the REST API

| Use Case | Recommended Tool | Notes |
|----------|-----------------|-------|
| Server-side scraping with Python | Python SDK | Full Playwright integration |
| Browser automation in Node.js | JavaScript SDK | Native async support |
| Quick testing/prototyping | REST API | No SDK installation required |
| Edge environments | REST API | Lightweight HTTP requests only |
| External integrations | REST API | Language-agnostic interface |

## Core Capabilities

### Query Execution

The REST API supports the same AgentQL query language available in the SDKs. Queries can extract structured data from web pages using natural language selectors and path-based element queries.

Example query structure:

```json
{
    "query": "items[] { title, price, url }",
    "url": "https://example.com/products"
}
```

### Data Extraction

The API returns structured JSON data matching the shape defined in the query. Lists, nested objects, and type conversions are supported.

## SDK vs REST API Comparison

| Feature | Python SDK | JavaScript SDK | REST API |
|---------|-----------|----------------|----------|
| Browser Automation | Yes | Yes | No |
| Query Execution | Yes | Yes | Yes |
| Installation Required | Yes | Yes | No |
| Authentication Support | Via SDK | Via SDK | Via API Key |
| Real-time Interaction | Yes | Yes | No |
| Pagination Handling | Manual | Manual | Manual |
| Rate Limiting | Client-side | Client-side | Server-enforced |

Source: [README.md](https://github.com/tinyfish-io/agentql/blob/main/README.md)

## Configuration Options

When using the REST API, authentication and request configuration are handled through HTTP headers:

| Parameter | Description | Required |
|-----------|-------------|----------|
| `Authorization` | API key for authentication | Yes |
| `Content-Type` | Request payload format (`application/json`) | Yes |
| `Accept` | Response format (`application/json`) | Yes |

## SDK Dependencies and Requirements

For SDK implementations that internally may use REST endpoints, the following dependencies are relevant:

### JavaScript SDK

Source: [examples/js/package.json](https://github.com/tinyfish-io/agentql/blob/main/examples/js/package.json)

```json
{
  "dependencies": {
    "agentql": "latest",
    "playwright": "^1.48.2",
    "playwright-dompath": "^0.0.7"
  }
}
```

### Python SDK

The Python SDK uses Playwright as its underlying browser automation framework and communicates with the AgentQL query service.

Source: [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)

```python
from playwright.async_api import async_playwright
import agentql
```

## Common Usage Patterns

### Structured Data Extraction

Both SDK and REST API approaches support extracting structured lists from pages:

Source: [examples/python/list_query_usage/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/main.py)

```python
QUERY = """
{
    products[]
    {
        name
        price(integer)
    }
}
"""
```

### Multi-Source Aggregation

The REST API can be called from multiple sources to aggregate data:

Source: [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)

```python
WEBSITE_URLS = [
    "https://bsky.app/search?q=agents+for+the+web",
    "https://dev.to/search?q=agents%20for%20the+web",
    "https://hn.algolia.com/?dateRange=last24h&query=agents%20for%20the%20web",
]
```

## Authentication and Security

The REST API uses API key authentication. Keys should be passed in the `Authorization` header:

```bash
curl -X POST https://api.agentql.com/v1/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "{ title }", "url": "https://example.com"}'
```

## Limitations and Considerations

### Edge Environment Compatibility

The REST API is particularly useful in edge environments where full SDK installation is not possible. However, issues have been reported when combining JavaScript SDK with Cloudflare's Browser Rendering feature, as some Node.js APIs may not be available in edge runtime environments.

Source: [Issue #128: AgentQL (JS) x Cloudflare's Browser Rendering](https://github.com/tinyfish-io/agentql/issues/128)

### Element Resolution

When using queries that resolve elements, some elements may be resolved as generic containers (like `<span>`) rather than the expected semantic elements. This can affect data extraction accuracy.

Source: [Issue #121: querying element resolved as useless span](https://github.com/tinyfish-io/agentql/issues/121)

### Documentation Links

When referencing examples or tutorials, ensure you use the correct documentation paths. Some older links may point to incorrect directories.

Source: [Issue #64: Invalid Link | Documentation > Examples > Collab](https://github.com/tinyfish-io/agentql/issues/64)

## Integration with Agent Frameworks

The REST API can be integrated with various agent frameworks as a lightweight alternative to SDK-based approaches. External services like run.pay have expressed interest in using AgentQL for autonomous AI agents to perform web interactions.

Source: [Issue #153: Monetize AgentQL with run.pay](https://github.com/tinyfish-io/agentql/issues/153)

## See Also

- [Python SDK Documentation](https://docs.agentql.com/python-sdk/installation)
- [JavaScript SDK Documentation](https://docs.agentql.com/javascript-sdk/installation)
- [REST API Reference](https://docs.agentql.com/rest-api/api-reference)
- [AgentQL Query Language](https://docs.agentql.com/agentql-query/query-intro)
- [Quick Start Guide](https://docs.agentql.com/quick-start)

---

<a id='page-query-language'></a>

## AgentQL Query Language

### Related Pages

Related topics: [Query Examples and Patterns](#page-query-examples)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/tinyfish-io/agentql/blob/main/README.md)
- [examples/python/first_steps/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)
- [examples/python/list_query_usage/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/README.md)
- [examples/js/list-query-usage/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/list-query-usage/README.md)
- [examples/js/first-steps/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/first-steps/README.md)
- [examples/python/collect_ecommerce_pricing_data/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_ecommerce_pricing_data/README.md)
- [examples/python/collect_paginated_news_headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/README.md)
- [examples/js/collect-paginated-news-headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-news-headlines/README.md)
</details>

# AgentQL Query Language

## Overview

The AgentQL Query Language is a domain-specific query language designed to extract structured data and locate DOM elements on web pages using natural language descriptions. It serves as the core abstraction layer that enables AI agents and LLMs to interact with web content in a robust, maintainable way that survives UI changes.

AgentQL queries are declarative, resembling a subset of GraphQL syntax, and support both element location and data extraction within a single unified syntax. Source: [README.md:1-10]()

## Core Concepts

### Query Types

AgentQL distinguishes between two primary query operations:

| Query Type | Purpose | SDK Method | Returns |
|------------|---------|------------|---------|
| Element Query | Locate DOM elements for interaction | `query_elements()` | Playwright Locator objects |
| Data Query | Extract structured data from the page | `query_data()` | Dictionary/object with extracted values |

Source: [examples/python/first_steps/main.py:35-55]()

### Natural Language Selectors

Unlike traditional CSS selectors or XPath, AgentQL uses natural language to describe what elements or data to find. This approach provides:

- **Intuitive element discovery** — Describe elements by their purpose or content rather than markup structure
- **Cross-site compatibility** — The same query can work across different websites with similar content
- **Self-healing resilience** — When UI structure changes, natural language queries adapt automatically

Source: [README.md:8-15]()

## Query Syntax Reference

### Basic Structure

Queries are defined as multi-line strings using a GraphQL-like syntax:

```graphql
{
    element_name
}
```

Source: [examples/python/first_steps/main.py:22-25]()

### Object and Field Selection

Nested objects are queried using brace notation. Fields within objects return their text content or attribute values:

```graphql
{
    price_currency
    products[] {
        name
        price
    }
}
```

Source: [examples/python/first_steps/main.py:28-35]()

### Array Syntax

The `[]` suffix denotes arrays/lists of items. This syntax extracts multiple items matching the query pattern:

```graphql
{
    products[] {
        name
        price
    }
}
```

Source: [examples/python/list_query_usage/README.md:1-15]()

### Transforms

Transforms are applied inline to convert extracted data to specific types or formats. The transform name follows the field in parentheses:

```graphql
{
    products[] {
        name
        price(integer)
    }
}
```

In this example, `price(integer)` instructs AgentQL to extract the price text and convert it to an integer. Source: [examples/python/first_steps/main.py:33]()

### Natural Language Prompts

For element location, you can use free-form natural language descriptions via the `get_by_prompt()` method:

```python
NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)
```

Source: [examples/python/first_steps/main.py:37-39]()

## Usage Patterns

### Python SDK Pattern

```python
import agentql
from agentql.ext.playwright.sync_api import Page

# Wrap Playwright page for AgentQL capabilities
page = agentql.wrap(browser.new_page())
page.goto(URL)

# Define query
SEARCH_BOX_QUERY = """
{
    search_product_box
}
"""

# Locate element for interaction
response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type("fish", delay=200)

# Extract data
PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""
data = page.query_data(PRODUCT_DATA_QUERY)
```

Source: [examples/python/first_steps/main.py:1-60]()

### JavaScript SDK Pattern

```javascript
import agentql from 'agentql-api';

const page = await browser.newPage();
const wrappedPage = agentql.wrap(page);

await wrappedPage.goto(URL);

// Use same query syntax
const response = await wrappedPage.queryData(`
    {
        price_currency
        products[] {
            name
            price
        }
    }
`);
```

Source: [examples/js/first-steps/README.md:1-20]()

## Common Use Cases

### Collecting Paginated Data

For paginated content, queries can be combined with navigation logic to collect data across multiple pages:

```python
# Extract data from current page
data = page.query_data(PRODUCT_DATA_QUERY)
all_data.extend(data)

# Navigate to next page
next_button = page.query_elements("{ next_page_button }")
next_button.click()
```

Source: [examples/python/collect_paginated_news_headlines/README.md:1-20]()

### Form Interaction

Queries locate form fields and buttons for automated interaction:

```graphql
{
    username_field
    password_field
    submit_button
}
```

Source: [examples/js/submit-form/README.md:1-20]()

### Web Scraping with Structured Output

Queries define the exact shape of extracted data:

```python
QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""
data = page.query_data(QUERY)
# Returns: { "price_currency": "USD", "products": [{ "name": "Item", "price": 29 }] }
```

Source: [examples/python/collect_ecommerce_pricing_data/README.md:1-20]()

## Architecture

```mermaid
graph TD
    A[Developer writes<br/>AgentQL Query] --> B[AgentQL SDK sends<br/>query to API]
    B --> C[LLM interprets<br/>query semantically]
    C --> D[AgentQL returns<br/>element locators<br/>or extracted data]
    D --> E[SDK provides<br/>typed response]
    E --> F[query_elements<br/>returns Locators]
    E --> G[query_data<br/>returns structured data]
    F --> H[Playwright<br/>interacts with DOM]
    G --> I[Structured dict<br/>for downstream use]
    
    style A fill:#e1f5fe
    style H fill:#fff3e0
    style I fill:#e8f5e9
```

## Key Features Summary

| Feature | Description |
|---------|-------------|
| Natural language selectors | Describe elements by purpose, not CSS/XPath |
| Structured output | Define exact data shape in queries |
| Inline transforms | Convert data types during extraction |
| Array support | Query lists with `[]` syntax |
| Cross-site compatibility | Same queries work across similar sites |
| Self-healing | Queries adapt when UI changes |

Source: [README.md:8-15]()

## Integration Points

### Playwright Integration

AgentQL wraps Playwright page objects to provide query capabilities while preserving full Playwright API access:

```python
page = agentql.wrap(browser.new_page())
# Use both AgentQL and Playwright methods
response = page.query_elements(QUERY)
response.some_element.click()  # Playwright API
page.keyboard.press("Enter")   # Playwright API
```

Source: [examples/python/first_steps/main.py:41-48]()

### SDK Availability

| SDK | Installation Guide | Use Case |
|-----|-------------------|----------|
| Python SDK | [docs.agentql.com](https://docs.agentql.com/python-sdk/installation) | Automation, scraping |
| JavaScript SDK | [docs.agentql.com](https://docs.agentql.com/javascript-sdk/installation) | Node.js automation |

Source: [README.md:20-30]()

## Best Practices

1. **Use descriptive field names** — Match query field names to content purpose rather than HTML attributes
2. **Apply transforms early** — Convert data types in queries rather than post-processing
3. **Test with debugger extension** — Use the [AgentQL Debugger Chrome Extension](https://chromewebstore.google.com/detail/agentql-debugger/idnejmodeepdobpinkkgpkeabkabhhej) to refine queries interactively
4. **Leverage natural language prompts** — For complex element location, `get_by_prompt()` often provides better resilience than structured queries

Source: [examples/python/list_query_usage/README.md:1-20]()

## Debugging Queries

Install the [AgentQL Debugger Chrome Extension](https://docs.agentql.com/installation/chrome-extension-installation) to:

- Test queries in real-time on live sites
- View element matches and confidence scores
- Export optimized queries to Python or JavaScript

Source: [examples/python/first_steps/main.py:1-10]()

## Related Documentation

- [AgentQL Query Language Docs](https://docs.agentql.com/agentql-query/query-intro)
- [Python SDK Installation](https://docs.agentql.com/python-sdk/installation)
- [JavaScript SDK Installation](https://docs.agentql.com/javascript-sdk/installation)
- [REST API Reference](https://docs.agentql.com/rest-api/api-reference)
- [Playground](https://playground.agentql.com/) for interactive query testing

---

<a id='page-query-examples'></a>

## Query Examples and Patterns

### Related Pages

Related topics: [AgentQL Query Language](#page-query-language), [Data Collection Patterns](#page-data-collection)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/python/list_query_usage/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/main.py)
- [examples/python/first_steps/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)
- [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)
- [examples/js/news-aggregator/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/news-aggregator/main.js)
- [examples/python/infinite_scroll/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/infinite_scroll/README.md)
- [examples/js/list-query-usage/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/list-query-usage/README.md)
- [examples/python/collect_paginated_news_headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/README.md)
- [examples/js/collect-paginated-news-headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-news-headlines/README.md)
</details>

# Query Examples and Patterns

AgentQL provides a powerful query language that enables AI agents and LLMs to interact with web pages in a natural, resilient way. This page covers practical examples and common patterns for writing effective queries to extract data and locate elements on web pages.

## Overview

AgentQL queries are structured JSON-like expressions that define what data to extract or what elements to locate on a webpage. The query language supports:

- **Natural language selectors** that find elements based on semantic meaning
- **Structured data extraction** with typed transformations
- **List/array queries** for extracting multiple items
- **Cross-site compatibility** for reuse across similar websites

Source: [README.md](https://github.com/tinyfish-io/agentql/blob/main/README.md)

## Core Query Methods

AgentQL provides two primary API methods for interacting with web pages after wrapping a Playwright page object:

| Method | Purpose | Returns |
|--------|---------|---------|
| `query_data()` | Extract structured data from the page | Dictionary with extracted fields |
| `query_elements()` | Locate DOM elements for interaction | Element references for actions |
| `get_by_prompt()` | Find elements using natural language prompts | Element reference |

Source: [examples/python/first_steps/main.py:54-77](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)

### Python SDK Usage

```python
import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright

def main():
    with sync_playwright() as playwright:
        page = agentql.wrap(browser.new_page())
        page.goto(URL)
        
        # Extract data
        data = page.query_data(PRODUCT_DATA_QUERY)
        
        # Locate elements for interaction
        response = page.query_elements(SEARCH_BOX_QUERY)
```

Source: [examples/python/first_steps/main.py:1-45](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)

### JavaScript SDK Usage

```javascript
const { wrap } = require('agentql');
const { chromium } = require('playwright');

async function main() {
    const browser = await chromium.launch();
    const page = await wrap(await browser.newPage());
    await page.goto(URL);
    
    const data = await page.queryData(query);
}
```

Source: [examples/js/news-aggregator/main.js:1-20](https://github.com/tinyfish-io/agentql/blob/main/examples/js/news-aggregator/main.js)

## List Queries

List queries allow extraction of multiple items from a page, such as product listings, news headlines, or any repeating content.

### Basic List Query Pattern

Use the `[]` syntax to query arrays of items:

```
{
    products[]
    {
        name
        price(integer)
    }
}
```

Source: [examples/python/list_query_usage/main.py:15-21](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/main.py)

### Python List Query Example

```python
QUERY = """
{
    products[]
    {
        name
        price(integer)
    }
}
"""

def main():
    with sync_playwright() as playwright:
        page = agentql.wrap(browser.new_page())
        page.goto(URL)
        
        response = page.query_data(QUERY)
        
        # Iterate over extracted products
        for product in response["products"]:
            file.write(f"{product['name']},{product['price']}\n")
```

Source: [examples/python/list_query_usage/main.py:1-40](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/main.py)

### JavaScript List Query Example

```javascript
const query = `
{
    items(might be articles, posts, tweets)[]
    {
        published_date(convert to XX/XX/XXXX format)
        entry(title or post if no title is available)
        author(person's name; return "n/a" if not available)
        outlet(the original platform it is posted on)
        url
    }
}
`;

const data = await page.queryData(query);
```

Source: [examples/js/news-aggregator/main.js:10-19](https://github.com/tinyfish-io/agentql/blob/main/examples/js/news-aggregator/main.js)

## Data Transformations

AgentQL supports inline transformations within queries to convert data types or format values.

### Type Conversions

Use `(type)` syntax to convert extracted values:

| Transformation | Example | Description |
|----------------|---------|-------------|
| `(integer)` | `price(integer)` | Convert string to integer |
| `(float)` | `rating(float)` | Convert to decimal number |
| `(string)` | `date(string)` | Ensure string output |

Source: [examples/python/first_steps/main.py:34-35](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)

### Format Instructions

Include format hints directly in the query:

```
{
    published_date(convert to XX/XX/XXXX format)
    entry(title or post if no title is available)
}
```

Source: [examples/js/news-aggregator/main.js:12-13](https://github.com/tinyfish-io/agentql/blob/main/examples/js/news-aggregator/main.js)

## Natural Language Element Location

The `get_by_prompt()` method uses natural language to find elements, making queries resilient to UI changes.

### Finding Elements with Prompts

```python
NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"

def _add_qwilfish_to_cart(page: Page):
    """Add Qwilfish to cart with AgentQL Smart Locator API."""
    # Find DOM element using natural language prompt
    qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)
    
    # Interact with the element using Playwright API
    qwilfish_page_btn.click()
```

Source: [examples/python/first_steps/main.py:79-88](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)

## Handling Dynamic Content

### Infinite Scroll Patterns

Pages that load content based on scroll position require simulating scroll events:

```python
def key_press_end_scroll(page):
    """Scroll to the end of the page by pressing End key."""
    page.keyboard.press("End")

def mouse_wheel_scroll(page):
    """Alternative scroll using mouse wheel for different page behaviors."""
    page.mouse.wheel(0, 3000)
```

Source: [examples/python/infinite_scroll/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/infinite_scroll/README.md)

> **Note**: Scrolling to the end of a page by pressing the `End` key is not always reliable. Some pages have multiple scrollable areas, or the `End` key may be mapped to different functions. Test both `key_press_end_scroll()` and `mouse_wheel_scroll()` to find what works for your target site.

### Paginated Data Collection

For pages with explicit pagination, iterate through pages while collecting data:

```python
async def collect_paginated_data(page, pages_to_collect):
    """Collect data from multiple paginated pages."""
    all_data = []
    
    for page_num in range(pages_to_collect):
        data = await page.query_data(QUERY)
        all_data.extend(data["items"])
        
        # Navigate to next page
        await page.click("[aria-label='Next']")
        await page.wait_for_load_state("networkidle")
    
    return all_data
```

Source: [examples/python/collect_paginated_news_headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/README.md)

## Concurrent Data Collection

Fetch data from multiple URLs concurrently within the same browser session:

```python
async def main():
    WEBSITE_URLS = [
        "https://bsky.app/search?q=agents+for+the+web",
        "https://dev.to/search?q=agents%20for%20the+web",
        "https://hn.algolia.com/?q=agents%20for%20the+web",
    ]
    
    async with async_playwright() as p:
        async with await p.chromium.launch(headless=True) as browser:
            async with await browser.new_context() as context:
                await asyncio.gather(
                    *(fetch_data(context, url) for url in WEBSITE_URLS)
                )
```

Source: [examples/python/news-aggregator/main.py:1-30](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)

## Data Export Patterns

### CSV Export

```python
import os

SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
CSV_FILE_PATH = os.path.join(SCRIPT_DIR, "news_headlines.csv")

def export_to_csv(data):
    with open(CSV_FILE_PATH, "w", encoding="utf-8") as file:
        file.write("Name, Price\n")
        for product in data["products"]:
            file.write(f"{product['name']},{product['price']}\n")
```

Source: [examples/python/list_query_usage/main.py:24-33](https://github.com/tinyfish-io/agentql/blob/main/examples/python/list_query_usage/main.py)

### Cleaning Data for Export

When exporting to CSV, clean special characters to avoid formatting issues:

```python
for item in data["items"]:
    # Strip '|' from entry to avoid CSV formatting issues
    clean_entry = item["entry"].replace("|", "")
    new_lines.append(
        f"{item['published_date']} | {clean_entry} | {item['url']}\n"
    )
```

Source: [examples/python/news-aggregator/main.py:45-50](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)

## Query Workflow Diagram

```mermaid
graph TD
    A[Initialize Browser with Playwright] --> B[Wrap Page with AgentQL]
    B --> C[Navigate to Target URL]
    C --> D{Select Query Method}
    D -->|Extract Data| E[Use query_data with QUERY]
    D -->|Locate Elements| F[Use query_elements or get_by_prompt]
    E --> G[Process Results]
    F --> H[Interact with Elements via Playwright]
    H --> G
    G --> I{More Pages?}
    I -->|Yes| C
    I -->|No| J[Export/Return Results]
```

## Common Query Patterns Summary

| Pattern | Use Case | Example Query |
|---------|----------|---------------|
| List extraction | Products, articles, items | `products[] { name, price }` |
| Type conversion | Numeric data | `price(integer)` |
| Format hints | Date formatting | `date(convert to MM/DD/YYYY)` |
| Flexible matching | Ambiguous content | `items(might be articles)[]` |
| Natural language | Element location | `get_by_prompt("Submit button")` |

## Working with the AgentQL Debugger

The [AgentQL Debugger Chrome extension](https://docs.agentql.com/installation/chrome-extension-installation) allows you to:

- Test queries interactively on any webpage
- Refine natural language selectors
- Verify element selection before writing scripts

Install the extension and use it to experiment with queries before integrating them into your scripts.

## Best Practices

1. **Start with the Debugger** - Test queries in the Chrome extension before coding
2. **Use type conversions** - Specify `(integer)` or `(float)` for numeric fields
3. **Handle edge cases** - Use format instructions like `return "n/a" if not available`
4. **Clean exported data** - Remove special characters before CSV export
5. **Test pagination** - Verify scroll and navigation methods work for your target site
6. **Use natural language sparingly** - Reserve `get_by_prompt()` for complex or dynamic selectors

## Related Documentation

- [AgentQL Query Language](https://docs.agentql.com/agentql-query/query-intro)
- [Python SDK Installation](https://docs.agentql.com/python-sdk/installation)
- [JavaScript SDK Installation](https://docs.agentql.com/javascript-sdk/installation)
- [Chrome Extension Installation](https://docs.agentql.com/installation/chrome-extension-installation)

---

<a id='page-browser-modes'></a>

## Browser Modes and Configuration

### Related Pages

Related topics: [Integrations and Framework Connections](#page-integrations)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/python/run_script_in_headless_browser/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/run_script_in_headless_browser/main.py)
- [examples/python/stealth_mode/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/stealth_mode/main.py)
- [examples/python/humanlike-antibot/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/humanlike-antibot/main.py)
- [examples/python/use_remote_browser/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/use_remote_browser/main.py)
- [examples/js/humanlike-antibot/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/humanlike-antibot/main.js)
- [examples/js/use-existing-browser/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/use-existing-browser/README.md)
- [examples/python/use_existing_browser/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/use_existing_browser/README.md)
- [examples/js/collect-paginated-news-headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-news-headlines/README.md)
- [examples/python/collect_paginated_news_headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/README.md)
- [examples/js/package.json](https://github.com/tinyfish-io/agentql/blob/main/examples/js/package.json)
</details>

# Browser Modes and Configuration

AgentQL provides flexible browser configuration options through its integration with Playwright, enabling developers to customize browser behavior for various use cases including headless automation, stealth operations, human-like interaction patterns, and remote browser connections.

## Overview

Browser modes in AgentQL determine how the underlying Playwright browser instance operates during data extraction and automation tasks. The configuration system supports multiple deployment scenarios ranging from fully automated server-side operations to interactive debugging sessions.

The core browser configuration is handled through the `agentql.wrap()` function for synchronous operations and `agentql.wrap_async()` for asynchronous workflows, which accept a Playwright page object and enable AgentQL's query capabilities on top of it.

Source: [examples/python/news-aggregator/main_sync.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main_sync.py)

## Browser Launch Configuration

### Standard Browser Launch

The most common approach involves launching a browser instance directly within the script using Playwright's launch API. This provides full control over browser settings and lifecycle management.

```python
from playwright.sync_api import sync_playwright
import agentql

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    context = browser.new_context()
    page = agentql.wrap(context.new_page())
    # Perform operations
    browser.close()
```

Source: [examples/python/news-aggregator/main_sync.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main_sync.py)

### Asynchronous Browser Launch

For applications requiring concurrent operations, AgentQL supports asynchronous browser management through Python's asyncio:

```python
import asyncio
from playwright.async_api import async_playwright
import agentql

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context()
        page = await agentql.wrap_async(context.new_page())
        await page.goto(url)
        # Perform operations
        await browser.close()
```

Source: [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)

## Headless Mode

Headless mode runs the browser without a visible UI window, making it ideal for server-side automation, continuous integration pipelines, and resource-constrained environments. AgentQL examples consistently demonstrate headless configuration for production deployments.

### Configuration Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| headless | boolean | true | Controls UI visibility |
| args | list | [] | Chromium command-line arguments |
| downloads_path | string | None | Directory for download operations |

Source: [examples/python/collect_paginated_news_headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/README.md)

### Headless Browser Workflow

```mermaid
graph TD
    A[Initialize Playwright] --> B[Launch Chromium with headless=True]
    B --> C[Create Browser Context]
    C --> D[Wrap Page with AgentQL]
    D --> E[Execute Query Operations]
    E --> F[Close Browser]
```

Source: [examples/python/run_script_in_headless_browser/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/run_script_in_headless_browser/main.py)

## Stealth Mode

Stealth mode configures the browser to minimize detection by anti-bot systems. This involves modifying browser attributes and behaviors that automated browsers typically expose.

### Implementation Example

The stealth mode example demonstrates configuration to avoid common automation detection vectors:

```python
from playwright.sync_api import sync_playwright
import agentql

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        args=[
            '--disable-blink-features=AutomationControlled',
            '--exclude-switches', 'enable-automation'
        ]
    )
    context = browser.new_context()
    # Additional stealth configurations
    page = agentql.wrap(context.new_page())
```

Source: [examples/python/stealth_mode/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/stealth_mode/main.py)

### Stealth Configuration Options

| Configuration | Purpose | Implementation |
|--------------|---------|----------------|
| AutomationControlled flag | Hide webdriver presence | Chromium launch arguments |
| User agent spoofing | Match real browser signatures | Browser context settings |
| Navigator properties | Normalize exposed JavaScript values | Page.evaluate() modifications |

Source: [examples/python/stealth_mode/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/stealth_mode/main.py)

## Humanlike Mode and Anti-Bot Evasion

Humanlike mode simulates genuine user behavior to evade anti-bot detection systems. This includes randomizing interaction timing, mimicking scroll patterns, and implementing natural mouse movements.

### Python Implementation

```python
import random
import time
from playwright.sync_api import sync_playwright
import agentql

def humanlike_scroll(page):
    """Simulate natural scrolling behavior"""
    scroll_amount = random.randint(300, 800)
    page.evaluate(f'window.scrollBy(0, {scroll_amount})')
    time.sleep(random.uniform(0.5, 2.0))

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = agentql.wrap(browser.new_page())
    
    # Apply humanlike interaction patterns
    page.goto(target_url)
    for _ in range(random.randint(2, 5)):
        humanlike_scroll(page)
```

Source: [examples/python/humanlike-antibot/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/humanlike-antibot/main.py)

### JavaScript Implementation

```javascript
const { wrap, configure } = require('agentql');
const { chromium } = require('playwright');

async function humanlikeDelay() {
  const delay = Math.floor(Math.random() * 2000) + 500;
  return new Promise(resolve => setTimeout(resolve, delay));
}

async function main() {
  const browser = await chromium.launch({ headless: false });
  const page = await wrap(await browser.newPage());
  
  await page.goto(url);
  await humanlikeDelay();
}
```

Source: [examples/js/humanlike-antibot/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/humanlike-antibot/main.js)

### Humanlike Interaction Patterns

| Pattern | Description | Anti-Bot Impact |
|---------|-------------|-----------------|
| Random delays | Variable wait times between actions | Prevents uniform timing detection |
| Variable scroll | Randomized scroll distances and speeds | Mimics human browsing behavior |
| Mouse movements | Non-linear cursor paths | Evades motion tracking systems |
| Typing simulation | Randomized keystroke intervals | Avoids robotic typing detection |

Source: [examples/python/humanlike-antibot/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/humanlike-antibot/main.py)

## Remote Browser Connection

AgentQL supports connecting to existing browser instances running remotely, which is essential for Cloudflare Browser Rendering integration and distributed scraping architectures.

### Connection Workflow

```mermaid
graph LR
    A[Start Remote Browser<br/>with debugging port] --> B[Connect via<br/>WebSocket URL]
    B --> C[Create AgentQL Page]
    C --> D[Execute Queries]
    D --> E[Retrieve Results]
```

Source: [examples/js/use-existing-browser/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/use-existing-browser/README.md)

### WebSocket Connection Format

Remote browser connections use the WebSocket debugging protocol:

```
ws://127.0.0.1:9222/devtools/browser/{browser-id}
```

Source: [examples/python/use_existing_browser/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/use_existing_browser/README.md)

### Python Remote Browser Usage

```python
import agentql
from playwright.sync_api import sync_playwright

# Connect to existing browser via DevTools URL
REMOTE_BROWSER_URL = "ws://127.0.0.1:9222/devtools/browser/387adf4c-243f-4051-a181-46798f4a46f4"

with sync_playwright() as p:
    # Connect to the remote browser instead of launching
    browser = p.chromium.connect_over_cdp(REMOTE_BROWSER_URL)
    context = browser.new_context()
    page = agentql.wrap(context.new_page())
    
    # Navigate to pages within the connected browser
    page.goto("https://scrapeme.live/shop/Charmander/")
    data = page.query_data(QUERY)
```

Source: [examples/python/use_remote_browser/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/use_remote_browser/main.py)

### JavaScript Remote Browser Usage

```javascript
const { wrap, configure } = require('agentql');
const { chromium } = require('playwright');

const REMOTE_BROWSER_URL = 'ws://127.0.0.1:9222/devtools/browser/387adf4c-243f-4051-a181-46798f4a46f4';

async function main() {
  // Connect to existing browser instance
  const browser = await chromium.connectOverCDP(REMOTE_BROWSER_URL);
  const page = await wrap(await browser.newPage());
  
  await page.goto('https://scrapeme.live/shop/Charmander/');
  const data = await page.queryData(QUERY);
}
```

Source: [examples/js/use-existing-browser/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/use-existing-browser/README.md)

## Browser Context Configuration

Browser contexts provide isolation between browsing sessions, enabling parallel operations and independent cookie/storage management.

### Context Options

| Option | Type | Description |
|--------|------|-------------|
| viewport | dict | Browser window dimensions |
| user_agent | string | Custom user agent string |
| locale | string | Browser locale setting |
| timezone_id | string | Simulated timezone |
| permissions | list | Granted permissions |
| ignore_https_errors | boolean | SSL certificate handling |

Source: [examples/js/package.json](https://github.com/tinyfish-io/agentql/blob/main/examples/js/package.json)

### Multiple Context Example

```python
from playwright.sync_api import sync_playwright
import agentql

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    
    # Create multiple independent contexts
    context1 = browser.new_context(
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    )
    
    context2 = browser.new_context(
        viewport={'width': 1366, 'height': 768},
        locale='en-GB'
    )
    
    page1 = agentql.wrap(context1.new_page())
    page2 = agentql.wrap(context2.new_page())
```

Source: [examples/python/news-aggregator/main_sync.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main_sync.py)

## API Key Configuration

AgentQL requires API key configuration for cloud-based query execution. The configuration can be set explicitly or rely on environment variables.

### Python Configuration

```python
from agentql import configure

# Set API key explicitly
configure(api_key="your-agentql-api-key")
```

### JavaScript Configuration

```javascript
const { wrap, configure } = require('agentql');

// Configure API key
configure({
  apiKey: process.env.AGENTQL_API_KEY
});
```

Source: [examples/js/get-by-prompt/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/get-by-prompt/main.js)

## Page Navigation and Waiting

Proper page load handling is crucial for reliable data extraction across different website architectures.

### Wait Strategies

| Strategy | Use Case | Implementation |
|----------|----------|----------------|
| networkidle | SPA with dynamic content | `page.wait_for_load_state('networkidle')` |
| domcontentloaded | Simple pages | `page.goto(url)` default |
| commit | Fast redirects | Immediate navigation |
| timeout | Slow connections | `page.goto(url, timeout=30000)` |

Source: [examples/python/collect_paginated_news_headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/README.md)

### Navigation with AgentQL

```python
import agentql
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = agentql.wrap(browser.new_page())
    
    # Standard navigation
    page.goto("https://example.com")
    
    # Wait for dynamic content
    page.wait_for_load_state('networkidle')
    
    # Execute query after page is ready
    data = page.query_data(QUERY)
```

Source: [examples/js/collect-paginated-news-headlines/README.md](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-news-headlines/README.md)

## Best Practices

### Mode Selection Guidelines

- **Headless Mode**: Use for production deployments, CI/CD pipelines, and server-side automation where no user interaction is needed
- **Stealth Mode**: Apply when targeting sites with anti-bot measures that check for automation indicators
- **Humanlike Mode**: Reserve for high-security targets requiring behavioral analysis evasion
- **Remote Browser**: Employ when debugging, testing across specific browser versions, or integrating with cloud browser services

### Security Considerations

Community issue #128 discusses the challenges of using AgentQL with Cloudflare's Browser Rendering in edge environments. Some Node.js APIs behave differently in edge contexts, requiring adaptation of browser configuration code.

Source: [github.com/tinyfish-io/agentql/issues/128](https://github.com/tinyfish-io/agentql/issues/128)

### Performance Optimization

| Technique | Impact | Implementation |
|-----------|--------|----------------|
| Context reuse | Reduces memory overhead | Reuse contexts for related pages |
| Async operations | Improves throughput | Use `wrap_async()` for concurrent tasks |
| Headless mode | Reduces resource usage | Default to headless=True |
| Selective waits | Faster execution | Use specific wait conditions over timeouts |

Source: [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)

---

<a id='page-data-collection'></a>

## Data Collection Patterns

### Related Pages

Related topics: [Query Examples and Patterns](#page-query-examples), [Integrations and Framework Connections](#page-integrations)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/python/collect_paginated_ecommerce_listing_data/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_ecommerce_listing_data/main.py)
- [examples/python/collect_paginated_news_headlines/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/collect_paginated_news_headlines/main.py)
- [examples/python/maps_scraper/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/maps_scraper/main.py)
- [examples/js/maps_scraper/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/maps_scraper/main.js)
- [examples/js/collect-paginated-ecommerce-data/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/collect-paginated-ecommerce-data/main.js)
- [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)
- [examples/js/news-aggregator/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/news-aggregator/main.js)
- [examples/python/first_steps/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)
</details>

# Data Collection Patterns

AgentQL provides robust patterns for collecting structured data from websites. These patterns leverage the query language's natural language selectors and structured output capabilities to extract data reliably across different page layouts and UI changes.

## Overview

Data collection in AgentQL revolves around extracting structured information from web pages using queries that define the expected data shape. The patterns demonstrated in the examples cover common scenarios including paginated data collection, multi-URL aggregation, and list extraction with transformations.

## Pagination Patterns

Pagination patterns enable collecting data that spans multiple pages, a common requirement for e-commerce listings, news archives, and search results.

### Python Implementation

The paginated data collection pattern uses a loop structure that:

1. Navigates to the initial page
2. Extracts data using `query_data()` with a structured query
3. Detects pagination elements to proceed to the next page
4. Continues until no more pages exist or a limit is reached

```python
# Source: examples/python/collect_paginated_ecommerce_listing_data/main.py
from playwright.sync_api import sync_playwright
import agentql

URL = "https://scrapeme.live/shop"

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await agentql.wrap_async(browser.new_page())
        
        all_products = []
        current_page = 1
        max_pages = 5
        
        while current_page <= max_pages:
            await page.goto(f"{URL}/page/{current_page}/")
            
            # Query structured data from the page
            data = await page.query_data(PRODUCT_DATA_QUERY)
            all_products.extend(data.get("products", []))
            
            current_page += 1
```

### JavaScript Implementation

```javascript
// Source: examples/js/collect-paginated-ecommerce-data/main.js
const { chromium } = require('playwright');
const agentql = require('agentql');

(async () => {
  const browser = await chromium.launch({ headless: true });
  const page = await agentql.wrap(browser.newPage());
  
  let pageNum = 1;
  const allProducts = [];
  
  while (pageNum <= maxPages) {
    await page.goto(`${baseUrl}?page=${pageNum}`);
    const data = await page.queryData(PRODUCT_QUERY);
    allProducts.push(...data.products);
    pageNum++;
  }
})();
```

### Pagination Query Structure

| Element | Query Field | Purpose |
|---------|-------------|---------|
| Product cards | `products[]` | Array of product items on each page |
| Pagination control | `next_page_button` | Element to click for next page |
| Item counter | `total_items` | Total count displayed on page |
| Page indicator | `current_page` | Current page number |

## Multi-URL Aggregation Patterns

Collecting data from multiple URLs simultaneously improves efficiency when you need to aggregate information from disparate sources.

### Concurrent Tab Collection

The news aggregator example demonstrates opening multiple URLs in separate tabs within the same browser context:

```python
# Source: examples/python/news-aggregator/main.py
WEBSITE_URLS = [
    "https://duckduckgo.com/?q=agents+for+the+web&t=h_&iar=news&ia=news",
    # Additional URLs...
]

async def main():
    async with async_playwright() as p, await p.chromium.launch(
        headless=True
    ) as browser, await browser.new_context() as context:
        # Open multiple tabs concurrently to fetch data
        await asyncio.gather(
            *(fetch_data(context, url) for url in WEBSITE_URLS)
        )
```

### Data Flow Architecture

```mermaid
graph TD
    A[Start Browser Context] --> B[Create Multiple Tabs]
    B --> C[Concurrent URL Navigation]
    C --> D[Query Data per Page]
    D --> E[Transform & Clean Data]
    E --> F[Write to CSV/JSON]
    F --> G[Close Browser]
```

### Handling Multi-Source Data

Each source may return data in different structures. The aggregator normalizes this using AgentQL queries that return consistent field names:

```python
# Source: examples/python/news-aggregator/main.py
QUERY = """
{
    items[] {
        entry
        published_date
        url
        outlet
        author
    }
}
"""
```

## List Extraction Patterns

Extracting lists of items requires defining array fields in the AgentQL query syntax using `[]` notation.

### Basic List Query

```python
# Source: examples/python/first_steps/main.py
PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""
```

The `products[]` notation defines an array of items, where each item contains `name` and `price` fields. The `(integer)` modifier transforms the price string to a numeric type.

### Data Transformation During Extraction

AgentQL supports inline transformations within queries:

| Transform | Syntax | Example |
|-----------|--------|---------|
| Type conversion | `(type)` | `price(integer)`, `date(date)` |
| String cleaning | `.strip()` | `title.strip()` |
| Array filtering | `[condition]` | `items[count > 0]` |

## Maps and Location Data Collection

The maps scraper examples demonstrate collecting geographic and location-based data:

### Python Maps Scraper

```python
# Source: examples/python/maps_scraper/main.py
LOCATION_QUERY = """
{
    business_name
    rating
    reviews_count
    address
    phone
    website
    category
}
"""
```

### JavaScript Maps Scraper

```javascript
// Source: examples/js/maps_scraper/main.js
const LOCATION_QUERY = `
{
    business_name
    rating
    reviews_count
    address
    phone
    website
    category
}
`;
```

Both implementations follow the same pattern:

1. Navigate to the map service URL with search parameters
2. Wait for results to load
3. Execute the query to extract structured location data
4. Store results in the desired format

## Data Export Patterns

AgentQL examples demonstrate multiple export formats for collected data.

### CSV Export

```python
# Source: examples/python/news-aggregator/main.py
CSV_FILE_PATH = os.path.join(SCRIPT_DIR, "news_headlines.csv")

async def fetch_data(context: BrowserContext, session_url):
    page = await agentql.wrap_async(context.new_page())
    await page.goto(session_url)
    
    data = await page.query_data(QUERY)
    
    # Prepare new data with pipe-separated format
    new_lines = []
    for item in data["items"]:
        # Strip '|' from entry to avoid CSV formatting issues
        clean_entry = item["entry"].replace("|", "")
        new_lines.append(
            f"{item['published_date']} | {clean_entry} | {item['url']} | {item['outlet']} | {item['author']}\n"
        )
```

### Data Cleaning During Export

| Issue | Solution | Example |
|-------|----------|---------|
| CSV delimiter collision | Strip delimiter characters | `item["entry"].replace("|", "")` |
| Type inconsistency | Apply transforms in query | `price(integer)` |
| Missing fields | Provide defaults | `field or "N/A"` |
| Whitespace | Trim strings | `field.strip()` |

## Error Handling Patterns

Resilient data collection requires proper error handling for network issues, page load failures, and query mismatches.

### Try-Except Block Pattern

```python
# Source: examples/python/collect_paginated_news_headlines/main.py
async def collect_headlines(page, query, max_pages=10):
    all_headlines = []
    
    for page_num in range(1, max_pages + 1):
        try:
            await page.goto(f"{BASE_URL}&page={page_num}")
            await page.wait_for_load_state("networkidle")
            
            data = await page.query_data(query)
            headlines = data.get("headlines", [])
            
            if not headlines:
                break  # No more data available
                
            all_headlines.extend(headlines)
            
        except Exception as e:
            print(f"Error on page {page_num}: {e}")
            continue
            
    return all_headlines
```

### Resilience to UI Changes

AgentQL's natural language selectors provide resilience to UI changes. When page structure changes, queries using semantic descriptions continue to work, unlike CSS selectors that break when DOM structure changes.

## Best Practices

### Query Design

- **Use semantic field names**: Match query field names to visible content, not DOM attributes
- **Define array fields explicitly**: Use `[]` notation for lists of similar items
- **Apply transforms early**: Use type conversions in queries rather than post-processing
- **Handle missing data**: Design queries with optional fields using the `?` modifier

### Performance Optimization

| Technique | Implementation |
|-----------|----------------|
| Concurrent tab collection | Use `asyncio.gather()` for multiple URLs |
| Headless browsing | Set `headless=True` for server environments |
| Context reuse | Reuse browser contexts to maintain session state |
| Pagination limits | Set maximum page counts to prevent infinite loops |

### Cross-Site Compatibility

The same AgentQL query can work across sites with similar content structure. For example, a product listing query designed for one e-commerce site may work on another with minimal modification due to the natural language selector approach.

## Related Documentation

- [AgentQL Query Language](https://docs.agentql.com/agentql-query/query-intro)
- [Python SDK Installation](https://docs.agentql.com/python-sdk/installation)
- [JavaScript SDK Installation](https://docs.agentql.com/javascript-sdk/installation)
- [Chrome Extension for Query Testing](https://docs.agentql.com/installation/chrome-extension-installation)

---

<a id='page-integrations'></a>

## Integrations and Framework Connections

### Related Pages

Related topics: [REST API](#page-rest-api), [Browser Modes and Configuration](#page-browser-modes)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/python/log_into_sites/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/log_into_sites/main.py)
- [examples/python/save_and_load_authenticated_session/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/save_and_load_authenticated_session/main.py)
- [examples/js/log-into-sites/main.js](https://github.com/tinyfish-io/agentql/blob/main/examples/js/log-into-sites/main.js)
- [examples/python/news-aggregator/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/news-aggregator/main.py)
- [examples/python/perform_sentiment_analysis/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/perform_sentiment_analysis/main.py)
- [examples/python/first_steps/main.py](https://github.com/tinyfish-io/agentql/blob/main/examples/python/first_steps/main.py)
</details>

# Integrations and Framework Connections

AgentQL provides flexible integration options with various frameworks, automation tools, and deployment environments. This page covers the available SDKs, framework connections, authentication patterns, and deployment considerations.

## Overview

AgentQL connects LLMs and AI agents to the web through its query language and Playwright integrations. The platform offers multiple integration pathways:

| Integration Type | Description |
|-----------------|-------------|
| Python SDK | Running automation and scraping scripts with AgentQL queries in Python |
| JavaScript SDK | Running automation and scraping scripts with AgentQL queries in JavaScript |
| REST API | Executing queries without an SDK |
| MCP Server | Model Context Protocol integration for AI agents |
| Framework Integrations | Langchain, Zapier, and other automation tools |

## SDK Integration Architecture

AgentQL provides seamless integration with Playwright, the browser automation library. Both Python and JavaScript SDKs wrap Playwright's browser context to enable AgentQL querying capabilities.

### Python SDK Integration

The Python SDK integrates with Playwright's sync and async APIs. The core integration pattern uses the `agentql.wrap()` function to extend Playwright page objects with AgentQL querying capabilities.

```python
import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright

def main():
    with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
        page = agentql.wrap(browser.new_page())
        page.goto(URL)
```

Source: [examples/python/first_steps/main.py:1-19]()

### JavaScript SDK Integration

The JavaScript SDK follows a similar pattern, wrapping Playwright page objects to provide AgentQL querying methods.

```javascript
const { chromium } = require('playwright');
const agentql = require('agentql');

(async () => {
  const browser = await chromium.launch();
  const page = await agentql.wrap(browser.newPage());
  await page.goto('https://example.com');
})();
```

Source: [examples/js/log-into-sites/main.js:1-50]()

### SDK Dependencies

| SDK | Key Dependencies |
|-----|------------------|
| Python SDK | playwright, agentql |
| JavaScript SDK | playwright, playwright-dompath, openai, agentql |

Source: [examples/js/package.json:1-30]()

## Authentication and Session Management

AgentQL supports authenticated web interactions through session persistence and browser context management.

### Login Pattern

Authentication is achieved by performing login actions before executing AgentQL queries. The pattern involves navigating to the login page, performing credentials entry, and then executing queries within the authenticated session.

```python
async def log_in(page):
    await page.goto(LOGIN_URL)
    await page.fill(USERNAME_SELECTOR, USERNAME)
    await page.fill(PASSWORD_SELECTOR, PASSWORD)
    await page.click(LOGIN_BUTTON)
    await page.wait_for_load_state("networkidle")
```

Source: [examples/python/log_into_sites/main.py:1-60]()

### Session Persistence

Authenticated sessions can be saved and restored using Playwright's storage state mechanism. This allows maintaining login state across script executions.

```python
async def save_authenticated_session(context, storage_path):
    await context.storage_state(path=storage_path)

async def load_authenticated_session(browser, storage_path):
    context = await browser.new_context(storage_state=storage_path)
    return context
```

Source: [examples/python/save_and_load_authenticated_session/main.py:1-80]()

### Session Flow

```mermaid
graph TD
    A[Launch Browser] --> B{Check for Existing Session}
    B -->|Session Exists| C[Load Storage State]
    B -->|No Session| D[Create New Context]
    C --> E[Navigate to Target URL]
    D --> F[Login to Site]
    F --> E
    E --> G[Execute AgentQL Queries]
    G --> H[Optional: Save Session]
```

## Framework Integrations

### LangChain Integration

AgentQL integrates with LangChain for building agent workflows that interact with web pages. The integration allows LangChain agents to use natural language queries that translate to AgentQL queries.

> **Community Note:** The LangChain integration enables AI agents to browse and extract data from websites using natural language instructions.

### Zapier Integration

AgentQL provides Zapier integration for no-code automation workflows, enabling users to incorporate web data extraction into automated processes without writing code.

### MCP Server

The Model Context Protocol (MCP) server integration allows AI agents to interact with web pages through a standardized protocol. This enables:

- Remote browser control
- Query execution via API
- Integration with AI agent frameworks

## External AI Service Integration

AgentQL can be combined with external AI services for advanced data processing, such as sentiment analysis on extracted content.

```python
from openai import OpenAI

def perform_sentiment_analysis(comments):
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": SYSTEM_MESSAGE},
            {"role": "user", "content": USER_MESSAGE},
        ],
    )
    return completion.choices[0].message.content
```

Source: [examples/python/perform_sentiment_analysis/main.py:1-50]()

### Data Processing Pipeline

```mermaid
graph LR
    A[Web Page] -->|AgentQL Query| B[Extract Data]
    B --> C[Process with LLM]
    C -->|Sentiment| D[Analysis Results]
    C -->|Summary| E[Content Summary]
```

## Cloudflare Browser Rendering Integration

> **Community Note:** Issue #128 discusses using AgentQL with Cloudflare's Browser Rendering feature, which provides browser instances from Cloudflare Workers via Playwright.

The integration with Cloudflare Browser Rendering enables:

- Edge-based browser automation
- Scalable browser infrastructure
- Serverless web scraping workflows

### Edge Environment Considerations

When deploying AgentQL in edge environments like Cloudflare Workers:

- Node.js APIs may have limitations
- CDP (Chrome DevTools Protocol) connection handling differs from standard Node.js
- Browser instance lifecycle management requires careful handling

Source: [Issue #128: AgentQL JS x Cloudflare Browser Rendering](https://github.com/tinyfish-io/agentql/issues/128)

## REST API Integration

For environments where SDK installation is not feasible, AgentQL provides a REST API for executing queries without an SDK.

| Endpoint Type | Use Case |
|--------------|----------|
| Query Execution | Execute AgentQL queries via HTTP |
| Data Extraction | Retrieve structured data from web pages |

Source: [REST API Documentation](https://docs.agentql.com/rest-api/api-reference)

## Integration Patterns

### Concurrent Data Collection

AgentQL supports concurrent page interactions using async patterns:

```python
async def main():
    async with async_playwright() as p, await p.chromium.launch(headless=True) as browser, await browser.new_context() as context:
        await asyncio.gather(
            *(fetch_data(context, url) for url in WEBSITE_URLS)
        )
```

Source: [examples/python/news-aggregator/main.py:1-40]()

### Pagination Handling

Integration with pagination enables data collection across multiple pages:

```python
async def collect_paginated_data(page, query):
    all_items = []
    while True:
        data = await page.query_data(query)
        all_items.extend(data["items"])
        if not await page.locator("next_button").is_visible():
            break
        await page.click("next_button")
    return all_items
```

### Multi-Tab Browser Context

For concurrent operations, AgentQL supports multiple tabs within a single browser context:

```python
async def fetch_data(context, url):
    page = await agentql.wrap_async(context.new_page())
    await page.goto(url)
    data = await page.query_data(QUERY)
    return data
```

## Configuration Options

### Browser Launch Options

| Option | Type | Description |
|--------|------|-------------|
| headless | boolean | Run browser without visible UI |
| args | list | Additional browser arguments |
| viewport | dict | Browser viewport dimensions |

### Query Options

| Option | Description |
|--------|-------------|
| timeout | Maximum wait time for query results |
| retry_count | Number of retry attempts on failure |
| strict_mode | Enable strict element matching |

## Best Practices

### Error Handling

- Implement retry logic for network failures
- Handle authentication session expiration gracefully
- Use appropriate timeouts for slow-loading pages

### Resource Management

- Close browser contexts when operations complete
- Use headless mode for production deployments
- Reuse browser instances for multiple queries when possible

### Security Considerations

- Store credentials securely (environment variables, secrets management)
- Implement session timeout policies
- Validate SSL certificates for production use

## Related Documentation

- [Python SDK Installation](https://docs.agentql.com/python-sdk/installation)
- [JavaScript SDK Installation](https://docs.agentql.com/javascript-sdk/installation)
- [AgentQL Query Language](https://docs.agentql.com/agentql-query/query-intro)
- [REST API Reference](https://docs.agentql.com/rest-api/api-reference)
- [Integrations Overview](https://docs.agentql.com/integrations)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: tinyfish-io/agentql

Summary: Found 8 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_761b694cc0e94100b46ba5683041137b | https://github.com/tinyfish-io/agentql/issues/114

## 2. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_55a8aa1466634fb39e0b679f753270ec | https://github.com/tinyfish-io/agentql/issues/148

## 3. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | github_repo:760722197 | https://github.com/tinyfish-io/agentql

## 4. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:760722197 | https://github.com/tinyfish-io/agentql

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | github_repo:760722197 | https://github.com/tinyfish-io/agentql

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | github_repo:760722197 | https://github.com/tinyfish-io/agentql

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:760722197 | https://github.com/tinyfish-io/agentql

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:760722197 | https://github.com/tinyfish-io/agentql

<!-- canonical_name: tinyfish-io/agentql; human_manual_source: deepwiki_human_wiki -->