Doramagic Project Pack · Human Manual

agentql

AgentQL addresses a fundamental challenge in web automation: traditional selectors (CSS, XPath) are brittle and break when web pages change. AgentQL uses natural language queries to locate...

Introduction to AgentQL

Related topics: Quick Start Guide, Python SDK

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Query Methods

Continue reading this section for the full explanation and source context.

Section Python SDK

Continue reading this section for the full explanation and source context.

Section JavaScript SDK

Continue reading this section for the full explanation and source context.

Related topics: Quick Start Guide, Python SDK

Introduction to AgentQL

AgentQL is an open-source framework that connects Large Language Models (LLMs) and AI agents to the web through a natural language query language. It enables developers to extract structured data, automate web interactions, and build web scraping solutions using intuitive queries that remain resilient to UI changes over time.

Overview

AgentQL addresses a fundamental challenge in web automation: traditional selectors (CSS, XPath) are brittle and break when web pages change. AgentQL uses natural language queries to locate elements and extract data, making automation scripts more maintainable and adaptable.

The framework integrates seamlessly with Playwright, supporting both Python and JavaScript environments. It works on any webpage—public sites, private pages, URLs behind authentication—regardless of the site's structure or technology.

Source: README.md

Core Features

FeatureDescription
Natural Language SelectorsFind elements and data using intuitive queries based on page content
Structured OutputDefine data shapes within queries for consistent structured results
Cross-Site CompatibilityUse the same query across different sites with similar content
Transforms and ExtractsApply data transformations directly within queries
Resilience to UI ChangesQueries self-heal as page structures evolve
Works on Any PagePublic, private, authenticated—any URL

Source: README.md

Architecture

AgentQL follows a client-side wrapper pattern where the AgentQL SDK wraps Playwright's page objects to extend their functionality with query capabilities.

graph TD
    A[Developer] -->|Writes AgentQL Query| B[AgentQL SDK]
    B -->|Wraps| C[Playwright Page Object]
    C -->|Interacts with| D[Web Page]
    D -->|Returns DOM| C
    C -->|Processes| B
    B -->|Structured JSON| A
    
    E[LLM Backend] <-->|Natural Language Processing| B

Query Methods

The SDK provides two primary API methods for interacting with web pages:

MethodPurposeUse Case
query_elements()Locate DOM elementsAutomation, clicking, typing
query_data()Extract structured dataScraping, data collection
get_by_prompt()Natural language element lookupFinding elements by description

Source: examples/python/first_steps/main.py:1-80

SDKs and Tools

AgentQL provides multiple entry points for different development environments:

Python SDK

The Python SDK integrates with Playwright's synchronous API for automation and scraping scripts.

import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright

with sync_playwright() as playwright:
    page = agentql.wrap(browser.new_page())
    response = page.query_elements(SEARCH_BOX_QUERY)
    data = page.query_data(PRODUCT_DATA_QUERY)

Installation: pip install agentql Documentation: Python SDK Installation

Source: examples/python/first_steps/main.py:1-16

JavaScript SDK

The JavaScript SDK works with Playwright for Node.js environments.

import { chromium } from '@playwright/test';
import { wrap, query } from 'agentql';

async function main() {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  const wrappedPage = wrap(page);
  // Use wrappedPage.query_elements() and wrappedPage.query_data()
}

Installation: Available via npm Documentation: JavaScript SDK Installation

Source: examples/js/first-steps/README.md

REST API

Execute AgentQL queries without installing an SDK via the REST API endpoint.

Documentation: REST API Reference

Source: README.md

Additional Tools

ToolPurpose
Debugger Chrome ExtensionDebug and refine queries in real-time on live sites
PlaygroundInteractive environment to test queries and export scripts
AgentQL Query LanguageDefine queries with natural language syntax
MCP ServerIntegration for agent frameworks
LangChain IntegrationConnect with LangChain for agentic workflows

Source: README.md

Query Language

AgentQL queries use a GraphQL-like syntax to define what elements to find and what data to extract.

Basic Element Query

{
    search_product_box
    submit_button
    results_container
}

Source: examples/python/first_steps/main.py:23-30

Data Extraction Query

{
    price_currency
    products[] {
        name
        price(integer)
    }
}

The [] notation queries lists of items, and type annotations like (integer) apply transformations to extracted values.

Source: examples/python/first_steps/main.py:32-39

Natural Language Prompt

For element location, you can use free-form natural language prompts:

NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)

This approach finds elements based on semantic understanding rather than structural selectors.

Source: examples/python/first_steps/main.py:42-47

Common Use Cases

Collecting List Data

Extract multiple items from a page, such as product listings or search results:

PRODUCT_DATA_QUERY = """
{
    products[] {
        name
        price
        link
    }
}
"""
data = page.query_data(PRODUCT_DATA_QUERY)

Source: examples/python/list_query_usage/README.md

Handling Pagination

Step through multiple pages to collect large datasets:

// Collect HackerNews headlines across paginated pages
async function collectHeadlines(url, numPages) {
  const headlines = [];
  for (let i = 0; i < numPages; i++) {
    await page.goto(url + `?p=${i + 1}`);
    const data = await page.queryData(HEADLINES_QUERY);
    headlines.push(...data.headlines);
  }
  return headlines;
}

Source: examples/js/collect-paginated-news-headlines/README.md

Form Automation

Fill out and submit forms using natural language queries:

const FORM_QUERY = `
{
    username_field
    password_field
    submit_button
}
`;
const form = await page.queryElements(FORM_QUERY);
await form.username_field.fill('[email protected]');
await form.submit_button.click();

Source: examples/js/submit-form/README.md

E-commerce Data Collection

Extract pricing and product information from online stores:

PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""
response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type(search_key_word, delay=200)
page.keyboard.press("Enter")
data = page.query_data(PRODUCT_DATA_QUERY)

Source: examples/python/first_steps/main.py:31-60

Waiting for Page Load

Ensure pages fully load before querying:

await page.goto(url);
// Wait for network idle and dynamic content
await page.waitForLoadState('networkidle');
const data = await page.queryData(DATA_QUERY);

Source: examples/js/wait-for-entire-page-load/README.md

Integration Patterns

With AI Agents

AgentQL is designed for AI agent workflows. The framework allows agents to:

  1. Navigate to any URL
  2. Query elements using natural language
  3. Extract structured data
  4. Perform actions (click, type, scroll)
graph LR
    A[AI Agent] -->|Instruction| B[AgentQL SDK]
    B -->|Query| C[Web Page]
    C -->|Data| D[Structured Output]
    D -->|Analysis| A
    A -->|Action| B

Source: README.md

Cloudflare Workers Consideration

Users have explored using AgentQL with Cloudflare's Browser Rendering for edge environments. However, edge environments may have limitations with certain Node.js APIs that AgentQL depends on. See Issue #128 for community discussion on this integration pattern.

Source: Community Issue #128

Getting Started

Prerequisites

  • Python 3.8+ or Node.js 18+
  • Playwright installed

Installation

Python:

pip install agentql
playwright install chromium

JavaScript:

npm install agentql
npx playwright install chromium

Quick Start Steps

  1. Install the AgentQL SDK for your language
  2. Launch a browser with Playwright
  3. Wrap the page object with agentql.wrap()
  4. Write your first AgentQL query
  5. Use query_elements() for actions or query_data() for extraction
  6. Optional: Install the AgentQL Debugger Chrome Extension to test queries on live sites

Testing Your Queries

The AgentQL Playground at playground.agentql.com allows you to:

  • Test queries on live websites
  • Export working Python/JavaScript scripts
  • Optimize query patterns

Source: README.md

Community Resources

ResourceLink
Documentationdocs.agentql.com
Discord Communitydiscord.gg/agentql
X (Twitter)@agentql
LinkedIntinyfish-ai
Deep-dive ArticleStarlog Analysis

Known Limitations

  • Element resolution may occasionally return generic containers instead of specific elements (see Issue #121)
  • Edge environment compatibility requires additional configuration for Cloudflare Workers (Issue #128)

Summary

AgentQL bridges the gap between LLMs and web automation by providing a natural language query interface that abstracts away brittle CSS/XPath selectors. Its dual Python and JavaScript SDKs integrate with Playwright, making it accessible for both backend automation scripts and modern web agent frameworks. The structured output capability, combined with transforms and cross-site compatibility, makes AgentQL a robust choice for building maintainable web scraping and automation solutions.

Source: https://github.com/tinyfish-io/agentql / Human Manual

Quick Start Guide

Related topics: Python SDK, JavaScript SDK

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python SDK Installation

Continue reading this section for the full explanation and source context.

Section JavaScript SDK Installation

Continue reading this section for the full explanation and source context.

Section AgentQL Query Language

Continue reading this section for the full explanation and source context.

Related topics: Python SDK, JavaScript SDK

Quick Start Guide

AgentQL is a query language and SDK designed to connect LLMs and AI agents to the web. This guide provides everything you need to start using AgentQL within 5 minutes, whether you're using Python or JavaScript.

Prerequisites

Before beginning, ensure you have the following installed:

RequirementVersionPurpose
Python3.8+For Python SDK usage
Node.js18+For JavaScript SDK usage
PlaywrightLatestBrowser automation
AgentQL SDKLatestCore library

Python SDK Installation

Install the AgentQL Python SDK using pip:

pip install agentql

Install Playwright with the required browsers:

pip install playwright
playwright install chromium

JavaScript SDK Installation

Install the AgentQL JavaScript SDK using npm:

npm install agentql
npx playwright install chromium

Core Concepts

Understanding these fundamental concepts will help you write effective AgentQL queries:

AgentQL Query Language

AgentQL uses a JSON-like query syntax to describe what data to extract or what elements to interact with on a web page. Queries are written in natural language-like format, making them intuitive and self-documenting.

{
    search_product_box
    products[] {
        name
        price(integer)
    }
}

Source: examples/python/first_steps/main.py:29-36

Smart Locator vs Data Query API

AgentQL provides two distinct APIs:

API TypeMethodPurpose
Smart Locatorquery_elements()Locate elements for interaction
Data Queryquery_data()Extract structured data

Your First Script

Python Quick Start

Create a new file named main.py and add the following code:

#!/usr/bin/env python3
import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright

URL = "https://scrapeme.live/shop"

# Query to locate the search box element
SEARCH_BOX_QUERY = """
{
    search_product_box
}
"""

# Query for data extraction
PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""

def main():
    with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
        page = agentql.wrap(browser.new_page())
        page.goto(URL)
        
        product_data = page.query_data(PRODUCT_DATA_QUERY)
        print(product_data)

if __name__ == "__main__":
    main()

Source: examples/python/first_steps/main.py:1-45

Run the script:

python3 main.py

JavaScript Quick Start

Create a new file named main.js:

const agentql = require('agentql');
const { chromium } = require('playwright');

const URL = "https://scrapeme.live/shop";

const PRODUCT_QUERY = `
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
`;

async function main() {
    const browser = await chromium.launch({ headless: false });
    const page = await agentql.wrapAsync(browser.newPage());
    
    await page.goto(URL);
    const productData = await page.queryData(PRODUCT_QUERY);
    console.log(productData);
    
    await browser.close();
}

main();

Source: examples/js/collect-paginated-news-headlines/README.md:18-36

Run the script:

node main.js

Workflow Overview

graph TD
    A[Install AgentQL SDK] --> B[Import AgentQL Library]
    B --> C[Launch Browser with Playwright]
    C --> D[Wrap Page with AgentQL]
    D --> E[Write AgentQL Query]
    E --> F[Execute Query]
    F --> G[Process Results]
    G --> H[Close Browser]

Common Usage Patterns

Extracting Paginated Data

To collect data across multiple pages, use a loop with navigation:

import agentql
from playwright.sync_api import sync_playwright

async def collect_paginated_news():
    async with sync_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await agentql.wrap_async(browser.new_page())
        
        all_items = []
        for page_num in range(3):  # Collect 3 pages
            await page.goto(f"https://news.ycombinator.com?p={page_num + 1}")
            data = await page.query_data(QUERY)
            all_items.extend(data.get("items", []))
        
        await browser.close()
        return all_items

Source: examples/python/collect_paginated_news_headlines/README.md:1-22

Multi-URL Data Collection

Fetch data from multiple websites concurrently using async patterns:

import asyncio
import agentql
from agentql.ext.playwright.async_api import Page
from playwright.async_api import async_playwright

WEBSITE_URLS = [
    "https://duckduckgo.com/?q=agents+for+the+web&t=h_&iar=news&ia=news",
]

async def main():
    async with async_playwright() as p:
        async with await p.chromium.launch(headless=True) as browser:
            async with await browser.new_context() as context:
                await asyncio.gather(
                    *(fetch_data(context, url) for url in WEBSITE_URLS)
                )

async def fetch_data(context, session_url):
    page = await agentql.wrap_async(await context.new_page())
    await page.goto(session_url)
    data = await page.query_data(QUERY)
    return data

Source: examples/python/news-aggregator/main.py:17-36

Synchronous vs Asynchronous Execution

AgentQL supports both synchronous and asynchronous patterns:

PatternUse CaseAPI
SynchronousSimple scripts, sequential operationsagentql.wrap()
AsynchronousConcurrent operations, better performanceagentql.wrap_async()

Synchronous example:

from playwright.sync_api import sync_playwright

def main():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = agentql.wrap(browser.new_page())
        page.goto(URL)
        data = page.query_data(QUERY)
        browser.close()

Source: examples/python/news-aggregator/main_sync.py:17-27

Running Examples in Google Colab

You can run AgentQL examples directly in Google Colab without local installation:

  1. Navigate to the Google Colab example
  2. Open main.ipynb in Colab
  3. Run cells sequentially

This approach is useful for quick experimentation without setting up a local environment.

Writing Effective Queries

Querying Lists

Use array syntax [] to query multiple elements:

{
    products[] {
        name
        price
        description
    }
}

Data Type Transformations

Apply type conversions within queries:

{
    products[] {
        name
        price(integer)  # Convert to integer
        rating(float)   # Convert to float
    }
}

Source: examples/python/first_steps/main.py:34-36

Natural Language Prompts

For element location, use natural language prompts:

NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"

This allows flexible element selection based on descriptive intent rather than CSS selectors.

Troubleshooting Common Issues

Element Resolution Problems

If elements resolve as "useless span" or fail to locate expected elements:

Source: issues/tinyfish-io/agentql#121

Cloudflare Browser Rendering

When using AgentQL with Cloudflare's Browser Rendering:

  • Edge environments may have Node.js API limitations
  • Some synchronous Playwright APIs may not be available
  • Consider using async patterns for edge compatibility

Source: issues/tinyfish-io/agentql#128

Next Steps

After completing this quick start guide:

ResourceDescription
AgentQL Query LanguageDeep dive into query syntax
Python SDK ReferenceComplete API documentation
JavaScript SDK ReferenceJS API documentation
Examples RepositoryFull example collection
Discord CommunityGet help and share feedback

Key Takeaways

  1. Installation is straightforward - A single package install gets you started
  2. Two API modes - Choose sync for simplicity or async for performance
  3. Natural language queries - Write queries that describe intent, not selectors
  4. Structured output - Data returns in the shape you define in your query
  5. Cross-site compatibility - Queries work across similar sites with comparable content

Get started in 5 minutes by running the example scripts above, then explore the official documentation for advanced features and integrations.

Source: https://github.com/tinyfish-io/agentql / Human Manual

Python SDK

Related topics: JavaScript SDK, Browser Modes and Configuration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Key Capabilities

Continue reading this section for the full explanation and source context.

Section Prerequisites

Continue reading this section for the full explanation and source context.

Section Installation via pip

Continue reading this section for the full explanation and source context.

Related topics: JavaScript SDK, Browser Modes and Configuration

Python SDK

The AgentQL Python SDK provides a powerful interface for connecting LLMs and AI agents to the web through structured data queries and intelligent element location. Built as a wrapper around Microsoft Playwright, the SDK enables developers to extract structured data, interact with web elements, and automate browser workflows using AgentQL's query language and natural language prompts.

Overview

The Python SDK serves as the primary programming interface for Python developers building web automation, data extraction, and AI agent applications. It wraps Playwright's Page objects to provide AgentQL-specific querying capabilities while maintaining full access to Playwright's browser automation features.

Key Capabilities

CapabilityDescription
Structured Data ExtractionQuery web pages using AgentQL's query language to extract typed, structured data
Natural Language Element SelectionLocate elements using intuitive prompts instead of CSS selectors
Cross-Site CompatibilityWrite queries once and use them across similar websites
Dual API SupportAvailable in both synchronous and asynchronous implementations
Playwright IntegrationFull access to Playwright's browser automation features

Source: README.md:1-15

Installation

Prerequisites

  • Python 3.12 or later (Python 3.13 recommended)
  • Playwright browser binaries installed

Installation via pip

pip install agentql

Browser Binary Setup

After installing the SDK, initialize Playwright browsers:

playwright install chromium

The SDK is tested and recommended with Python 3.13 running on Debian 12 (Bookworm) slim base image, with Playwright v1.58.2 on Ubuntu 24.04 LTS.

Source: golden-images.yaml:1-30

Core API Methods

Wrapping a Page Object

To access AgentQL's querying capabilities, wrap a Playwright page object using agentql.wrap():

import agentql
from playwright.sync_api import sync_playwright

with sync_playwright() as playwright:
    browser = playwright.chromium.launch(headless=True)
    page = agentql.wrap(browser.new_page())

Source: examples/python/first_steps/main.py:35-39

query_data()

Extracts structured data from the page using an AgentQL query. Returns a dictionary matching the query structure.

PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""

data = page.query_data(PRODUCT_DATA_QUERY)
print(data)

Parameters:

ParameterTypeDescription
querystrAgentQL query defining the data structure to extract
timeoutintMaximum wait time in milliseconds (default: 30000)

Returns: Dictionary with keys matching the query fields

Source: examples/python/first_steps/main.py:30-34

query_elements()

Locates DOM elements matching an AgentQL query, returning element references that can be interacted with using Playwright's API.

SEARCH_BOX_QUERY = """
{
    search_product_box
}
"""

response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type("fish", delay=200)
page.keyboard.press("Enter")

Parameters:

ParameterTypeDescription
querystrAgentQL query defining elements to locate
timeoutintMaximum wait time in milliseconds (default: 30000)

Returns: Object with attributes matching query field names, containing Playwright Locator objects

Source: examples/python/first_steps/main.py:52-59

get_by_prompt()

Locates elements using natural language prompts. This method uses AI to find elements based on their semantic meaning rather than DOM structure.

# Locate the search bar using natural language
search_bar = page.get_by_prompt("the search bar")
search_bar.fill("AgentQL")

# Click a button using a description
page.get_by_prompt("the search button").click()

Parameters:

ParameterTypeDescription
promptstrNatural language description of the element
timeoutintMaximum wait time in milliseconds (default: 30000)

Returns: Playwright Locator object for the matched element, or None if not found

Source: examples/python/get_by_prompt/main.py:18-26

Asynchronous API

For applications requiring concurrent operations, use the async API with async_playwright and agentql.wrap_async():

import asyncio
import agentql
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        async with await browser.new_context() as context:
            page = await agentql.wrap_async(context.new_page())
            await page.goto("https://example.com")
            data = await page.query_data(QUERY)

Source: examples/python/news-aggregator/main.py:28-38

Concurrent Page Operations

The async API enables concurrent data fetching from multiple pages:

async def main():
    async with async_playwright() as p, await p.chromium.launch(headless=True) as browser:
        async with await browser.new_context() as context:
            await asyncio.gather(
                *(fetch_data(context, url) for url in WEBSITE_URLS)
            )

async def fetch_data(context, url):
    page = await agentql.wrap_async(context.new_page())
    await page.goto(url)
    data = await page.query_data(QUERY)

Source: examples/python/news-aggregator/main.py:28-44

Common Usage Patterns

E-commerce Data Extraction

Extract product information from e-commerce websites:

QUERY = """
{
    products[]
    {
        name
        price(integer)
    }
}
"""

page.goto("https://scrapeme.live/shop")
response = page.query_data(QUERY)

# Write to CSV
with open("product_data.csv", "w", encoding="utf-8") as file:
    file.write("Name, Price\n")
    for product in response["products"]:
        file.write(f"{product['name']},{product['price']}\n")

Source: examples/python/list_query_usage/main.py:14-30

Multi-Site Price Comparison

Compare product prices across different websites using the same query:

PRODUCT_INFO_QUERY = """
{
    nintendo_switch_price
}
"""

page.goto(NINTENDO_URL)
response = page.query_data(PRODUCT_INFO_QUERY)
print("Price at Nintendo: ", response["nintendo_switch_price"])

page.goto(TARGET_URL)
response = page.query_data(PRODUCT_INFO_QUERY)
print("Price at Target: ", response["nintendo_switch_price"])

Source: examples/python/compare_product_prices/main.py:20-31

List Data Extraction

Query lists of items on a page:

QUERY = """
{
    listings[]
    {
        name
        rating
        description
        order_link
        take_out_link
        address
        hours
    }
}
"""

response = page.query_data(QUERY)

for listing in response["listings"]:
    file.write(
        f"{listing['name']},{listing['rating']},{listing['description']}...\n"
    )

Source: examples/python/maps_scraper/main.py:1-15

Paginated Data Collection

Collect data across multiple pages by navigating through pagination:

for page_num in range(num_pages):
    page.goto(f"{BASE_URL}&page={page_num}")
    data = page.query_data(QUERY)
    all_results.extend(data["items"])

Source: examples/python/collect_paginated_news_headlines/README.md:1-20

Data Transformations

AgentQL queries support inline transformations to format extracted data:

QUERY = """
{
    items[]{
        published_date(convert to XX/XX/XXXX format)
        entry(title or post if no title is available)
        author(person's name; return "n/a" if not available)
        outlet(the original platform it is posted on)
        url
    }
}
"""

The SDK supports:

  • Type conversions (e.g., price(integer))
  • Date format transformations
  • Default values for missing fields
  • Conditional extraction logic

Source: examples/python/news-aggregator/main_sync.py:20-28

Advanced Configuration

Headless Mode

Run browsers in headless mode for server-side or CI environments:

with sync_playwright() as playwright:
    playwright.chromium.launch(headless=True)  # Default for CI/CD

For debugging, disable headless mode:

with sync_playwright() as playwright:
    playwright.chromium.launch(headless=False)  # Visible browser

Browser Contexts

Use browser contexts to isolate sessions, cookies, and state:

async with await browser.new_context() as context:
    # Each context has independent storage
    page = await agentql.wrap_async(context.new_page())

Logging Configuration

Configure logging for debugging and monitoring:

import logging

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

log.info("All done! CSV is here: %s", CSV_FILE_PATH)

Source: examples/python/news-aggregator/main.py:14-15

Architecture

graph TD
    A[Python Application] --> B[AgentQL SDK]
    B --> C[Playwright API]
    C --> D[Browser Instance]
    E[AgentQL Query Language] --> B
    F[Natural Language Prompts] --> B
    G[Web Page DOM] --> D
    D --> H[Structured Data Response]
    B --> H
    
    subgraph "AgentQL SDK Components"
        B
        I[query_data method]
        J[query_elements method]
        K[get_by_prompt method]
    end
    
    I --> B
    J --> B
    K --> B

Relationship to JavaScript SDK

The Python SDK shares identical API patterns with the JavaScript SDK, enabling cross-language development:

FeaturePython SDKJavaScript SDK
Wrap Pageagentql.wrap(page)agentql.wrap(page)
Async Wrapagentql.wrap_async(page)agentql.wrapAsync(page)
Query Datapage.query_data(QUERY)page.queryData(QUERY)
Query Elementspage.query_elements(QUERY)page.queryElements(QUERY)
By Promptpage.get_by_prompt("text")page.getByPrompt("text")

Both SDKs use the same AgentQL query language and provide equivalent functionality for their respective platforms.

See Also

Source: https://github.com/tinyfish-io/agentql / Human Manual

JavaScript SDK

Related topics: Python SDK, REST API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section SDK Dependencies

Continue reading this section for the full explanation and source context.

Section Prerequisites

Continue reading this section for the full explanation and source context.

Section Setup

Continue reading this section for the full explanation and source context.

Related topics: Python SDK, REST API

JavaScript SDK

The AgentQL JavaScript SDK enables developers to build web automation and scraping applications using natural language queries. It provides a seamless integration with Playwright, allowing JavaScript and Node.js developers to leverage AgentQL's query language for extracting structured data from web pages.

Overview

The JavaScript SDK wraps Playwright's browser automation capabilities with AgentQL's intelligent querying layer. This combination allows developers to:

  • Query web pages using natural language descriptions
  • Extract structured data without relying on CSS selectors or XPath
  • Build resilient automation scripts that adapt to UI changes
  • Execute queries across multiple browser contexts simultaneously

Source: examples/js/package.json:1-28

SDK Dependencies

DependencyVersionPurpose
agentqllatestCore SDK package
playwright^1.48.2Browser automation framework
playwright-dompath^0.0.7DOM path resolution
openai^4.70.1LLM integration for query processing

Source: examples/js/package.json:18-22

Installation

Prerequisites

  • Node.js environment
  • Playwright browsers installed

Setup

const { wrap, configure } = require('agentql');
const { chromium } = require('playwright');

Configuration

Configure the SDK with your API key:

configure({
  apiKey: process.env.AGENTQL_API_KEY, // Optional, uses default if omitted
});

Source: examples/js/get-by-prompt/main.js:10-12

Core API

Wrapping a Playwright Page

The wrap() function transforms a standard Playwright Page object into an AgentQL-enabled page that supports natural language queries:

const { wrap } = require('agentql');
const { chromium } = require('playwright');

async function main() {
  const browser = await chromium.launch({ headless: false });
  const page = await wrap(await browser.newPage());
  
  await page.goto('https://example.com');
  
  // Now page has AgentQL query capabilities
}

Source: examples/js/get-by-prompt/main.js:14-17

getByPrompt Method

The getByPrompt() method locates elements using natural language descriptions. This is the primary way to interact with page elements:

// Locate a sign up button by describing what it does
const signUpBtn = await page.getByPrompt('Sign up button');

// Click the element if found
if (signUpBtn) {
  await signUpBtn.click();
}

Source: examples/js/get-by-prompt/main.js:24-30

queryData Method

The queryData() method extracts structured data from the page using AgentQL's query language:

const query = `
{
    products[] {
        name
        model
        sku
        price(integer)
    }
}
`;

const data = await page.queryData(QUERY);
console.log(data.products);

Source: examples/js/collect-pricing-data/main.js:12-23

AgentQL Query Language

The query language uses a GraphQL-like syntax to define the structure of desired data. Queries are processed by LLMs to find matching elements on the page.

Basic Query Structure

const query = `
{
    items[]
    {
        published_date
        entry
        author
        outlet
        url
    }
}
`;

Source: examples/js/news-aggregator/main.js:10-18

List Extraction

Use the [] notation to query arrays of items:

const query = `
{
    products[] {
        name
        price
    }
}
`;

Source: examples/js/collect-pricing-data/main.js:12-18

Data Transforms

Apply transforms within queries to modify extracted values:

const query = `
{
    items[] {
        published_date(convert to XX/XX/XXXX format)
        entry(title or post if no title is available)
    }
}
`;

Source: examples/js/news-aggregator/main.js:10-15

Type Conversions

Specify data types for extracted values:

const query = `
{
    products[] {
        name
        price(integer)
    }
}
`;

Source: examples/js/collect-pricing-data/main.js:15-17

Fallback Values

Handle missing data gracefully:

const query = `
{
    items[] {
        author(person's name; return "n/a" if not available)
        outlet(the original platform it is posted on; if no platform is listed, use the root domain of the url)
    }
}
`;

Source: examples/js/news-aggregator/main.js:15-18

Common Use Cases

Searching and Filtering

async function searchProduct(page, product, minPrice, maxPrice) {
  // Find search input using natural language
  const searchInput = await page.getByPrompt('the search input field');
  if (!searchInput) {
    console.log('Search input field not found.');
    return false;
  }
  
  // Type with realistic delay
  await searchInput.type(product, { delay: 200 });
  await searchInput.press('Enter');

  // Fill price range filters
  const minPriceInput = await page.getByPrompt('the min price input field');
  if (minPriceInput) {
    await minPriceInput.fill(String(minPrice));
  }

  const maxPriceInput = await page.getByPrompt('the max price input field');
  if (maxPriceInput) {
    await maxPriceInput.fill(String(maxPrice));
    await maxPriceInput.press('Enter');
  }
  return true;
}

Source: examples/js/collect-pricing-data/main.js:27-49

Pagination Handling

async function goToTheNextPage(page) {
  const nextPageQuery = `
    {
        pagination {
            prev_page
            next_page
        }
    }
  `;
  // Query and interact with pagination controls
}

Source: examples/js/collect-pricing-data/main.js:53-63

Multi-Tab Data Collection

const websiteUrls = [
  'https://bsky.app/search?q=agents+for+the+web',
  'https://dev.to/search?q=agents%20for+the+web',
  'https://hn.algolia.com/?query=agents%20for+the+web',
];

async function fetchData(context, sessionUrl) {
  const page = await wrap(await context.newPage());
  await page.goto(sessionUrl);
  const data = await page.queryData(query);
  // Process extracted data
}

// Fetch from multiple URLs concurrently
await asyncio.gather(
  *(fetchData(context, url) for url in websiteUrls)
);

Source: examples/js/news-aggregator/main.js:26-41

Workflow Diagram

graph TD
    A[Initialize Browser] --> B[Wrap Page with AgentQL]
    B --> C[Configure API Key]
    C --> D[Navigate to URL]
    D --> E[Execute Query or getByPrompt]
    E --> F{Query Type?}
    F -->|Data Extraction| G[queryData returns structured JSON]
    F -->|Element Interaction| H[getByPrompt returns element]
    G --> I[Process Results]
    H --> J[Interact with Element]
    J --> K[Wait for Navigation/Update]
    K --> E
    I --> L[Close Browser]

Configuration Options

Browser Launch Options

const browser = await chromium.launch({ 
  headless: false  // or true for headless mode
});

Source: examples/js/get-by-prompt/main.js:15

Browser Context Options

const context = await browser.newContext();
// Create multiple pages within the same context for concurrent operations
const page1 = await context.newPage();
const page2 = await context.newPage();

Source: examples/js/news-aggregator/main.js:27-31

Development Tools

Linting and Formatting

The SDK project includes pre-configured linting and formatting:

# Run ESLint
npm run lint

# Run Prettier
npm run format

Source: examples/js/package.json:7-10

Available Dev Dependencies

PackageVersionPurpose
eslint^8.57.0JavaScript linting
eslint-config-prettier^9.1.0Disables ESLint rules that conflict with Prettier
prettier^2.8.7Code formatting
@trivago/prettier-plugin-sort-imports^4.3.0Import sorting

Source: examples/js/package.json:11-15

Security Overrides

The SDK includes dependency version overrides for security patches:

"overrides": {
  "axios": "^1.15.0",
  "flatted": "^3.4.2",
  "follow-redirects": "^1.16.0",
  "lodash": "^4.18.0",
  "minimatch": "^3.1.3"
}

Source: examples/js/package.json:23-28

Known Limitations

Cloudflare Browser Rendering Compatibility

There is an open issue regarding compatibility with Cloudflare's Browser Rendering in edge environments. Cloudflare Workers use a restricted Node.js runtime that may not fully support all Playwright and AgentQL features. Developers targeting Cloudflare Workers should be aware of potential limitations with browser instance access.

Source: GitHub Issue #128

Element Resolution Edge Cases

In some cases, elements may be resolved as generic containers (e.g., <span>) rather than semantic elements. This can affect element location accuracy. When encountering such issues, try using more specific prompt descriptions or combining with Playwright's native locators.

Source: GitHub Issue #121

Additional Resources

ResourceDescription
Installation GuideFull SDK installation instructions
Query Language DocsComplete AgentQL query language reference
Chrome ExtensionDebug and test queries in real-time
PlaygroundInteractive query testing environment
Examples DirectoryComplete list of JavaScript examples

Source: https://github.com/tinyfish-io/agentql / Human Manual

REST API

Related topics: Python SDK, JavaScript SDK

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Query Execution

Continue reading this section for the full explanation and source context.

Section Data Extraction

Continue reading this section for the full explanation and source context.

Section JavaScript SDK

Continue reading this section for the full explanation and source context.

Related topics: Python SDK, JavaScript SDK

REST API

AgentQL provides a REST API as an alternative to the Python and JavaScript SDKs for executing queries without requiring a full SDK installation. The REST API enables developers to interact with the AgentQL query engine over HTTP, making it suitable for environments where SDK integration is not practical or for quick prototyping and testing.

Overview

The REST API is one of three tool options provided by AgentQL alongside the Python SDK and JavaScript SDK. It allows executing queries against web pages without needing to set up Playwright or maintain a browser automation environment locally.

Source: README.md

Architecture

graph TD
    A[Client Application] -->|HTTP POST /query| B[AgentQL REST API]
    B -->|Parse & Process Query| C[Query Engine]
    C -->|DOM Analysis| D[Web Page Content]
    D -->|Extracted Data| B
    B -->|JSON Response| A
    
    E[SDK Client] -->|Internal Request| B
    B -->|Same Flow| D

When to Use the REST API

Use CaseRecommended ToolNotes
Server-side scraping with PythonPython SDKFull Playwright integration
Browser automation in Node.jsJavaScript SDKNative async support
Quick testing/prototypingREST APINo SDK installation required
Edge environmentsREST APILightweight HTTP requests only
External integrationsREST APILanguage-agnostic interface

Core Capabilities

Query Execution

The REST API supports the same AgentQL query language available in the SDKs. Queries can extract structured data from web pages using natural language selectors and path-based element queries.

Example query structure:

{
    "query": "items[] { title, price, url }",
    "url": "https://example.com/products"
}

Data Extraction

The API returns structured JSON data matching the shape defined in the query. Lists, nested objects, and type conversions are supported.

SDK vs REST API Comparison

FeaturePython SDKJavaScript SDKREST API
Browser AutomationYesYesNo
Query ExecutionYesYesYes
Installation RequiredYesYesNo
Authentication SupportVia SDKVia SDKVia API Key
Real-time InteractionYesYesNo
Pagination HandlingManualManualManual
Rate LimitingClient-sideClient-sideServer-enforced

Source: README.md

Configuration Options

When using the REST API, authentication and request configuration are handled through HTTP headers:

ParameterDescriptionRequired
AuthorizationAPI key for authenticationYes
Content-TypeRequest payload format (application/json)Yes
AcceptResponse format (application/json)Yes

SDK Dependencies and Requirements

For SDK implementations that internally may use REST endpoints, the following dependencies are relevant:

JavaScript SDK

Source: examples/js/package.json

{
  "dependencies": {
    "agentql": "latest",
    "playwright": "^1.48.2",
    "playwright-dompath": "^0.0.7"
  }
}

Python SDK

The Python SDK uses Playwright as its underlying browser automation framework and communicates with the AgentQL query service.

Source: examples/python/news-aggregator/main.py

from playwright.async_api import async_playwright
import agentql

Common Usage Patterns

Structured Data Extraction

Both SDK and REST API approaches support extracting structured lists from pages:

Source: examples/python/list_query_usage/main.py

QUERY = """
{
    products[]
    {
        name
        price(integer)
    }
}
"""

Multi-Source Aggregation

The REST API can be called from multiple sources to aggregate data:

Source: examples/python/news-aggregator/main.py

WEBSITE_URLS = [
    "https://bsky.app/search?q=agents+for+the+web",
    "https://dev.to/search?q=agents%20for%20the+web",
    "https://hn.algolia.com/?dateRange=last24h&query=agents%20for%20the%20web",
]

Authentication and Security

The REST API uses API key authentication. Keys should be passed in the Authorization header:

curl -X POST https://api.agentql.com/v1/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "{ title }", "url": "https://example.com"}'

Limitations and Considerations

Edge Environment Compatibility

The REST API is particularly useful in edge environments where full SDK installation is not possible. However, issues have been reported when combining JavaScript SDK with Cloudflare's Browser Rendering feature, as some Node.js APIs may not be available in edge runtime environments.

Source: Issue #128: AgentQL (JS) x Cloudflare's Browser Rendering

Element Resolution

When using queries that resolve elements, some elements may be resolved as generic containers (like <span>) rather than the expected semantic elements. This can affect data extraction accuracy.

Source: Issue #121: querying element resolved as useless span

When referencing examples or tutorials, ensure you use the correct documentation paths. Some older links may point to incorrect directories.

Source: Issue #64: Invalid Link | Documentation > Examples > Collab

Integration with Agent Frameworks

The REST API can be integrated with various agent frameworks as a lightweight alternative to SDK-based approaches. External services like run.pay have expressed interest in using AgentQL for autonomous AI agents to perform web interactions.

Source: Issue #153: Monetize AgentQL with run.pay

See Also

Source: https://github.com/tinyfish-io/agentql / Human Manual

AgentQL Query Language

Related topics: Query Examples and Patterns

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Query Types

Continue reading this section for the full explanation and source context.

Section Natural Language Selectors

Continue reading this section for the full explanation and source context.

Section Basic Structure

Continue reading this section for the full explanation and source context.

Related topics: Query Examples and Patterns

AgentQL Query Language

Overview

The AgentQL Query Language is a domain-specific query language designed to extract structured data and locate DOM elements on web pages using natural language descriptions. It serves as the core abstraction layer that enables AI agents and LLMs to interact with web content in a robust, maintainable way that survives UI changes.

AgentQL queries are declarative, resembling a subset of GraphQL syntax, and support both element location and data extraction within a single unified syntax. Source: README.md:1-10

Core Concepts

Query Types

AgentQL distinguishes between two primary query operations:

Query TypePurposeSDK MethodReturns
Element QueryLocate DOM elements for interactionquery_elements()Playwright Locator objects
Data QueryExtract structured data from the pagequery_data()Dictionary/object with extracted values

Source: examples/python/first_steps/main.py:35-55

Natural Language Selectors

Unlike traditional CSS selectors or XPath, AgentQL uses natural language to describe what elements or data to find. This approach provides:

  • Intuitive element discovery — Describe elements by their purpose or content rather than markup structure
  • Cross-site compatibility — The same query can work across different websites with similar content
  • Self-healing resilience — When UI structure changes, natural language queries adapt automatically

Source: README.md:8-15

Query Syntax Reference

Basic Structure

Queries are defined as multi-line strings using a GraphQL-like syntax:

{
    element_name
}

Source: examples/python/first_steps/main.py:22-25

Object and Field Selection

Nested objects are queried using brace notation. Fields within objects return their text content or attribute values:

{
    price_currency
    products[] {
        name
        price
    }
}

Source: examples/python/first_steps/main.py:28-35

Array Syntax

The [] suffix denotes arrays/lists of items. This syntax extracts multiple items matching the query pattern:

{
    products[] {
        name
        price
    }
}

Source: examples/python/list_query_usage/README.md:1-15

Transforms

Transforms are applied inline to convert extracted data to specific types or formats. The transform name follows the field in parentheses:

{
    products[] {
        name
        price(integer)
    }
}

In this example, price(integer) instructs AgentQL to extract the price text and convert it to an integer. Source: examples/python/first_steps/main.py:33

Natural Language Prompts

For element location, you can use free-form natural language descriptions via the get_by_prompt() method:

NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)

Source: examples/python/first_steps/main.py:37-39

Usage Patterns

Python SDK Pattern

import agentql
from agentql.ext.playwright.sync_api import Page

# Wrap Playwright page for AgentQL capabilities
page = agentql.wrap(browser.new_page())
page.goto(URL)

# Define query
SEARCH_BOX_QUERY = """
{
    search_product_box
}
"""

# Locate element for interaction
response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type("fish", delay=200)

# Extract data
PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""
data = page.query_data(PRODUCT_DATA_QUERY)

Source: examples/python/first_steps/main.py:1-60

JavaScript SDK Pattern

import agentql from 'agentql-api';

const page = await browser.newPage();
const wrappedPage = agentql.wrap(page);

await wrappedPage.goto(URL);

// Use same query syntax
const response = await wrappedPage.queryData(`
    {
        price_currency
        products[] {
            name
            price
        }
    }
`);

Source: examples/js/first-steps/README.md:1-20

Common Use Cases

Collecting Paginated Data

For paginated content, queries can be combined with navigation logic to collect data across multiple pages:

# Extract data from current page
data = page.query_data(PRODUCT_DATA_QUERY)
all_data.extend(data)

# Navigate to next page
next_button = page.query_elements("{ next_page_button }")
next_button.click()

Source: examples/python/collect_paginated_news_headlines/README.md:1-20

Form Interaction

Queries locate form fields and buttons for automated interaction:

{
    username_field
    password_field
    submit_button
}

Source: examples/js/submit-form/README.md:1-20

Web Scraping with Structured Output

Queries define the exact shape of extracted data:

QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""
data = page.query_data(QUERY)
# Returns: { "price_currency": "USD", "products": [{ "name": "Item", "price": 29 }] }

Source: examples/python/collect_ecommerce_pricing_data/README.md:1-20

Architecture

graph TD
    A[Developer writes<br/>AgentQL Query] --> B[AgentQL SDK sends<br/>query to API]
    B --> C[LLM interprets<br/>query semantically]
    C --> D[AgentQL returns<br/>element locators<br/>or extracted data]
    D --> E[SDK provides<br/>typed response]
    E --> F[query_elements<br/>returns Locators]
    E --> G[query_data<br/>returns structured data]
    F --> H[Playwright<br/>interacts with DOM]
    G --> I[Structured dict<br/>for downstream use]
    
    style A fill:#e1f5fe
    style H fill:#fff3e0
    style I fill:#e8f5e9

Key Features Summary

FeatureDescription
Natural language selectorsDescribe elements by purpose, not CSS/XPath
Structured outputDefine exact data shape in queries
Inline transformsConvert data types during extraction
Array supportQuery lists with [] syntax
Cross-site compatibilitySame queries work across similar sites
Self-healingQueries adapt when UI changes

Source: README.md:8-15

Integration Points

Playwright Integration

AgentQL wraps Playwright page objects to provide query capabilities while preserving full Playwright API access:

page = agentql.wrap(browser.new_page())
# Use both AgentQL and Playwright methods
response = page.query_elements(QUERY)
response.some_element.click()  # Playwright API
page.keyboard.press("Enter")   # Playwright API

Source: examples/python/first_steps/main.py:41-48

SDK Availability

SDKInstallation GuideUse Case
Python SDKdocs.agentql.comAutomation, scraping
JavaScript SDKdocs.agentql.comNode.js automation

Source: README.md:20-30

Best Practices

  1. Use descriptive field names — Match query field names to content purpose rather than HTML attributes
  2. Apply transforms early — Convert data types in queries rather than post-processing
  3. Test with debugger extension — Use the AgentQL Debugger Chrome Extension to refine queries interactively
  4. Leverage natural language prompts — For complex element location, get_by_prompt() often provides better resilience than structured queries

Source: examples/python/list_query_usage/README.md:1-20

Debugging Queries

Install the AgentQL Debugger Chrome Extension to:

  • Test queries in real-time on live sites
  • View element matches and confidence scores
  • Export optimized queries to Python or JavaScript

Source: examples/python/first_steps/main.py:1-10

Source: https://github.com/tinyfish-io/agentql / Human Manual

Query Examples and Patterns

Related topics: AgentQL Query Language, Data Collection Patterns

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python SDK Usage

Continue reading this section for the full explanation and source context.

Section JavaScript SDK Usage

Continue reading this section for the full explanation and source context.

Section Basic List Query Pattern

Continue reading this section for the full explanation and source context.

Related topics: AgentQL Query Language, Data Collection Patterns

Query Examples and Patterns

AgentQL provides a powerful query language that enables AI agents and LLMs to interact with web pages in a natural, resilient way. This page covers practical examples and common patterns for writing effective queries to extract data and locate elements on web pages.

Overview

AgentQL queries are structured JSON-like expressions that define what data to extract or what elements to locate on a webpage. The query language supports:

  • Natural language selectors that find elements based on semantic meaning
  • Structured data extraction with typed transformations
  • List/array queries for extracting multiple items
  • Cross-site compatibility for reuse across similar websites

Source: README.md

Core Query Methods

AgentQL provides two primary API methods for interacting with web pages after wrapping a Playwright page object:

MethodPurposeReturns
query_data()Extract structured data from the pageDictionary with extracted fields
query_elements()Locate DOM elements for interactionElement references for actions
get_by_prompt()Find elements using natural language promptsElement reference

Source: examples/python/first_steps/main.py:54-77

Python SDK Usage

import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright

def main():
    with sync_playwright() as playwright:
        page = agentql.wrap(browser.new_page())
        page.goto(URL)
        
        # Extract data
        data = page.query_data(PRODUCT_DATA_QUERY)
        
        # Locate elements for interaction
        response = page.query_elements(SEARCH_BOX_QUERY)

Source: examples/python/first_steps/main.py:1-45

JavaScript SDK Usage

const { wrap } = require('agentql');
const { chromium } = require('playwright');

async function main() {
    const browser = await chromium.launch();
    const page = await wrap(await browser.newPage());
    await page.goto(URL);
    
    const data = await page.queryData(query);
}

Source: examples/js/news-aggregator/main.js:1-20

List Queries

List queries allow extraction of multiple items from a page, such as product listings, news headlines, or any repeating content.

Basic List Query Pattern

Use the [] syntax to query arrays of items:

{
    products[]
    {
        name
        price(integer)
    }
}

Source: examples/python/list_query_usage/main.py:15-21

Python List Query Example

QUERY = """
{
    products[]
    {
        name
        price(integer)
    }
}
"""

def main():
    with sync_playwright() as playwright:
        page = agentql.wrap(browser.new_page())
        page.goto(URL)
        
        response = page.query_data(QUERY)
        
        # Iterate over extracted products
        for product in response["products"]:
            file.write(f"{product['name']},{product['price']}\n")

Source: examples/python/list_query_usage/main.py:1-40

JavaScript List Query Example

const query = `
{
    items(might be articles, posts, tweets)[]
    {
        published_date(convert to XX/XX/XXXX format)
        entry(title or post if no title is available)
        author(person's name; return "n/a" if not available)
        outlet(the original platform it is posted on)
        url
    }
}
`;

const data = await page.queryData(query);

Source: examples/js/news-aggregator/main.js:10-19

Data Transformations

AgentQL supports inline transformations within queries to convert data types or format values.

Type Conversions

Use (type) syntax to convert extracted values:

TransformationExampleDescription
(integer)price(integer)Convert string to integer
(float)rating(float)Convert to decimal number
(string)date(string)Ensure string output

Source: examples/python/first_steps/main.py:34-35

Format Instructions

Include format hints directly in the query:

{
    published_date(convert to XX/XX/XXXX format)
    entry(title or post if no title is available)
}

Source: examples/js/news-aggregator/main.js:12-13

Natural Language Element Location

The get_by_prompt() method uses natural language to find elements, making queries resilient to UI changes.

Finding Elements with Prompts

NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"

def _add_qwilfish_to_cart(page: Page):
    """Add Qwilfish to cart with AgentQL Smart Locator API."""
    # Find DOM element using natural language prompt
    qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)
    
    # Interact with the element using Playwright API
    qwilfish_page_btn.click()

Source: examples/python/first_steps/main.py:79-88

Handling Dynamic Content

Infinite Scroll Patterns

Pages that load content based on scroll position require simulating scroll events:

def key_press_end_scroll(page):
    """Scroll to the end of the page by pressing End key."""
    page.keyboard.press("End")

def mouse_wheel_scroll(page):
    """Alternative scroll using mouse wheel for different page behaviors."""
    page.mouse.wheel(0, 3000)

Source: examples/python/infinite_scroll/README.md

Note: Scrolling to the end of a page by pressing the End key is not always reliable. Some pages have multiple scrollable areas, or the End key may be mapped to different functions. Test both key_press_end_scroll() and mouse_wheel_scroll() to find what works for your target site.

Paginated Data Collection

For pages with explicit pagination, iterate through pages while collecting data:

async def collect_paginated_data(page, pages_to_collect):
    """Collect data from multiple paginated pages."""
    all_data = []
    
    for page_num in range(pages_to_collect):
        data = await page.query_data(QUERY)
        all_data.extend(data["items"])
        
        # Navigate to next page
        await page.click("[aria-label='Next']")
        await page.wait_for_load_state("networkidle")
    
    return all_data

Source: examples/python/collect_paginated_news_headlines/README.md

Concurrent Data Collection

Fetch data from multiple URLs concurrently within the same browser session:

async def main():
    WEBSITE_URLS = [
        "https://bsky.app/search?q=agents+for+the+web",
        "https://dev.to/search?q=agents%20for%20the+web",
        "https://hn.algolia.com/?q=agents%20for%20the+web",
    ]
    
    async with async_playwright() as p:
        async with await p.chromium.launch(headless=True) as browser:
            async with await browser.new_context() as context:
                await asyncio.gather(
                    *(fetch_data(context, url) for url in WEBSITE_URLS)
                )

Source: examples/python/news-aggregator/main.py:1-30

Data Export Patterns

CSV Export

import os

SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
CSV_FILE_PATH = os.path.join(SCRIPT_DIR, "news_headlines.csv")

def export_to_csv(data):
    with open(CSV_FILE_PATH, "w", encoding="utf-8") as file:
        file.write("Name, Price\n")
        for product in data["products"]:
            file.write(f"{product['name']},{product['price']}\n")

Source: examples/python/list_query_usage/main.py:24-33

Cleaning Data for Export

When exporting to CSV, clean special characters to avoid formatting issues:

for item in data["items"]:
    # Strip '|' from entry to avoid CSV formatting issues
    clean_entry = item["entry"].replace("|", "")
    new_lines.append(
        f"{item['published_date']} | {clean_entry} | {item['url']}\n"
    )

Source: examples/python/news-aggregator/main.py:45-50

Query Workflow Diagram

graph TD
    A[Initialize Browser with Playwright] --> B[Wrap Page with AgentQL]
    B --> C[Navigate to Target URL]
    C --> D{Select Query Method}
    D -->|Extract Data| E[Use query_data with QUERY]
    D -->|Locate Elements| F[Use query_elements or get_by_prompt]
    E --> G[Process Results]
    F --> H[Interact with Elements via Playwright]
    H --> G
    G --> I{More Pages?}
    I -->|Yes| C
    I -->|No| J[Export/Return Results]

Common Query Patterns Summary

PatternUse CaseExample Query
List extractionProducts, articles, itemsproducts[] { name, price }
Type conversionNumeric dataprice(integer)
Format hintsDate formattingdate(convert to MM/DD/YYYY)
Flexible matchingAmbiguous contentitems(might be articles)[]
Natural languageElement locationget_by_prompt("Submit button")

Working with the AgentQL Debugger

The AgentQL Debugger Chrome extension allows you to:

  • Test queries interactively on any webpage
  • Refine natural language selectors
  • Verify element selection before writing scripts

Install the extension and use it to experiment with queries before integrating them into your scripts.

Best Practices

  1. Start with the Debugger - Test queries in the Chrome extension before coding
  2. Use type conversions - Specify (integer) or (float) for numeric fields
  3. Handle edge cases - Use format instructions like return "n/a" if not available
  4. Clean exported data - Remove special characters before CSV export
  5. Test pagination - Verify scroll and navigation methods work for your target site
  6. Use natural language sparingly - Reserve get_by_prompt() for complex or dynamic selectors

Source: https://github.com/tinyfish-io/agentql / Human Manual

Browser Modes and Configuration

Related topics: Integrations and Framework Connections

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Standard Browser Launch

Continue reading this section for the full explanation and source context.

Section Asynchronous Browser Launch

Continue reading this section for the full explanation and source context.

Section Configuration Parameters

Continue reading this section for the full explanation and source context.

Related topics: Integrations and Framework Connections

Browser Modes and Configuration

AgentQL provides flexible browser configuration options through its integration with Playwright, enabling developers to customize browser behavior for various use cases including headless automation, stealth operations, human-like interaction patterns, and remote browser connections.

Overview

Browser modes in AgentQL determine how the underlying Playwright browser instance operates during data extraction and automation tasks. The configuration system supports multiple deployment scenarios ranging from fully automated server-side operations to interactive debugging sessions.

The core browser configuration is handled through the agentql.wrap() function for synchronous operations and agentql.wrap_async() for asynchronous workflows, which accept a Playwright page object and enable AgentQL's query capabilities on top of it.

Source: examples/python/news-aggregator/main_sync.py

Browser Launch Configuration

Standard Browser Launch

The most common approach involves launching a browser instance directly within the script using Playwright's launch API. This provides full control over browser settings and lifecycle management.

from playwright.sync_api import sync_playwright
import agentql

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    context = browser.new_context()
    page = agentql.wrap(context.new_page())
    # Perform operations
    browser.close()

Source: examples/python/news-aggregator/main_sync.py

Asynchronous Browser Launch

For applications requiring concurrent operations, AgentQL supports asynchronous browser management through Python's asyncio:

import asyncio
from playwright.async_api import async_playwright
import agentql

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context()
        page = await agentql.wrap_async(context.new_page())
        await page.goto(url)
        # Perform operations
        await browser.close()

Source: examples/python/news-aggregator/main.py

Headless Mode

Headless mode runs the browser without a visible UI window, making it ideal for server-side automation, continuous integration pipelines, and resource-constrained environments. AgentQL examples consistently demonstrate headless configuration for production deployments.

Configuration Parameters

ParameterTypeDefaultDescription
headlessbooleantrueControls UI visibility
argslist[]Chromium command-line arguments
downloads_pathstringNoneDirectory for download operations

Source: examples/python/collect_paginated_news_headlines/README.md

Headless Browser Workflow

graph TD
    A[Initialize Playwright] --> B[Launch Chromium with headless=True]
    B --> C[Create Browser Context]
    C --> D[Wrap Page with AgentQL]
    D --> E[Execute Query Operations]
    E --> F[Close Browser]

Source: examples/python/run_script_in_headless_browser/main.py

Stealth Mode

Stealth mode configures the browser to minimize detection by anti-bot systems. This involves modifying browser attributes and behaviors that automated browsers typically expose.

Implementation Example

The stealth mode example demonstrates configuration to avoid common automation detection vectors:

from playwright.sync_api import sync_playwright
import agentql

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        args=[
            '--disable-blink-features=AutomationControlled',
            '--exclude-switches', 'enable-automation'
        ]
    )
    context = browser.new_context()
    # Additional stealth configurations
    page = agentql.wrap(context.new_page())

Source: examples/python/stealth_mode/main.py

Stealth Configuration Options

ConfigurationPurposeImplementation
AutomationControlled flagHide webdriver presenceChromium launch arguments
User agent spoofingMatch real browser signaturesBrowser context settings
Navigator propertiesNormalize exposed JavaScript valuesPage.evaluate() modifications

Source: examples/python/stealth_mode/main.py

Humanlike Mode and Anti-Bot Evasion

Humanlike mode simulates genuine user behavior to evade anti-bot detection systems. This includes randomizing interaction timing, mimicking scroll patterns, and implementing natural mouse movements.

Python Implementation

import random
import time
from playwright.sync_api import sync_playwright
import agentql

def humanlike_scroll(page):
    """Simulate natural scrolling behavior"""
    scroll_amount = random.randint(300, 800)
    page.evaluate(f'window.scrollBy(0, {scroll_amount})')
    time.sleep(random.uniform(0.5, 2.0))

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = agentql.wrap(browser.new_page())
    
    # Apply humanlike interaction patterns
    page.goto(target_url)
    for _ in range(random.randint(2, 5)):
        humanlike_scroll(page)

Source: examples/python/humanlike-antibot/main.py

JavaScript Implementation

const { wrap, configure } = require('agentql');
const { chromium } = require('playwright');

async function humanlikeDelay() {
  const delay = Math.floor(Math.random() * 2000) + 500;
  return new Promise(resolve => setTimeout(resolve, delay));
}

async function main() {
  const browser = await chromium.launch({ headless: false });
  const page = await wrap(await browser.newPage());
  
  await page.goto(url);
  await humanlikeDelay();
}

Source: examples/js/humanlike-antibot/main.js

Humanlike Interaction Patterns

PatternDescriptionAnti-Bot Impact
Random delaysVariable wait times between actionsPrevents uniform timing detection
Variable scrollRandomized scroll distances and speedsMimics human browsing behavior
Mouse movementsNon-linear cursor pathsEvades motion tracking systems
Typing simulationRandomized keystroke intervalsAvoids robotic typing detection

Source: examples/python/humanlike-antibot/main.py

Remote Browser Connection

AgentQL supports connecting to existing browser instances running remotely, which is essential for Cloudflare Browser Rendering integration and distributed scraping architectures.

Connection Workflow

graph LR
    A[Start Remote Browser<br/>with debugging port] --> B[Connect via<br/>WebSocket URL]
    B --> C[Create AgentQL Page]
    C --> D[Execute Queries]
    D --> E[Retrieve Results]

Source: examples/js/use-existing-browser/README.md

WebSocket Connection Format

Remote browser connections use the WebSocket debugging protocol:

ws://127.0.0.1:9222/devtools/browser/{browser-id}

Source: examples/python/use_existing_browser/README.md

Python Remote Browser Usage

import agentql
from playwright.sync_api import sync_playwright

# Connect to existing browser via DevTools URL
REMOTE_BROWSER_URL = "ws://127.0.0.1:9222/devtools/browser/387adf4c-243f-4051-a181-46798f4a46f4"

with sync_playwright() as p:
    # Connect to the remote browser instead of launching
    browser = p.chromium.connect_over_cdp(REMOTE_BROWSER_URL)
    context = browser.new_context()
    page = agentql.wrap(context.new_page())
    
    # Navigate to pages within the connected browser
    page.goto("https://scrapeme.live/shop/Charmander/")
    data = page.query_data(QUERY)

Source: examples/python/use_remote_browser/main.py

JavaScript Remote Browser Usage

const { wrap, configure } = require('agentql');
const { chromium } = require('playwright');

const REMOTE_BROWSER_URL = 'ws://127.0.0.1:9222/devtools/browser/387adf4c-243f-4051-a181-46798f4a46f4';

async function main() {
  // Connect to existing browser instance
  const browser = await chromium.connectOverCDP(REMOTE_BROWSER_URL);
  const page = await wrap(await browser.newPage());
  
  await page.goto('https://scrapeme.live/shop/Charmander/');
  const data = await page.queryData(QUERY);
}

Source: examples/js/use-existing-browser/README.md

Browser Context Configuration

Browser contexts provide isolation between browsing sessions, enabling parallel operations and independent cookie/storage management.

Context Options

OptionTypeDescription
viewportdictBrowser window dimensions
user_agentstringCustom user agent string
localestringBrowser locale setting
timezone_idstringSimulated timezone
permissionslistGranted permissions
ignore_https_errorsbooleanSSL certificate handling

Source: examples/js/package.json

Multiple Context Example

from playwright.sync_api import sync_playwright
import agentql

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    
    # Create multiple independent contexts
    context1 = browser.new_context(
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    )
    
    context2 = browser.new_context(
        viewport={'width': 1366, 'height': 768},
        locale='en-GB'
    )
    
    page1 = agentql.wrap(context1.new_page())
    page2 = agentql.wrap(context2.new_page())

Source: examples/python/news-aggregator/main_sync.py

API Key Configuration

AgentQL requires API key configuration for cloud-based query execution. The configuration can be set explicitly or rely on environment variables.

Python Configuration

from agentql import configure

# Set API key explicitly
configure(api_key="your-agentql-api-key")

JavaScript Configuration

const { wrap, configure } = require('agentql');

// Configure API key
configure({
  apiKey: process.env.AGENTQL_API_KEY
});

Source: examples/js/get-by-prompt/main.js

Page Navigation and Waiting

Proper page load handling is crucial for reliable data extraction across different website architectures.

Wait Strategies

StrategyUse CaseImplementation
networkidleSPA with dynamic contentpage.wait_for_load_state('networkidle')
domcontentloadedSimple pagespage.goto(url) default
commitFast redirectsImmediate navigation
timeoutSlow connectionspage.goto(url, timeout=30000)

Source: examples/python/collect_paginated_news_headlines/README.md

Navigation with AgentQL

import agentql
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = agentql.wrap(browser.new_page())
    
    # Standard navigation
    page.goto("https://example.com")
    
    # Wait for dynamic content
    page.wait_for_load_state('networkidle')
    
    # Execute query after page is ready
    data = page.query_data(QUERY)

Source: examples/js/collect-paginated-news-headlines/README.md

Best Practices

Mode Selection Guidelines

  • Headless Mode: Use for production deployments, CI/CD pipelines, and server-side automation where no user interaction is needed
  • Stealth Mode: Apply when targeting sites with anti-bot measures that check for automation indicators
  • Humanlike Mode: Reserve for high-security targets requiring behavioral analysis evasion
  • Remote Browser: Employ when debugging, testing across specific browser versions, or integrating with cloud browser services

Security Considerations

Community issue #128 discusses the challenges of using AgentQL with Cloudflare's Browser Rendering in edge environments. Some Node.js APIs behave differently in edge contexts, requiring adaptation of browser configuration code.

Source: github.com/tinyfish-io/agentql/issues/128

Performance Optimization

TechniqueImpactImplementation
Context reuseReduces memory overheadReuse contexts for related pages
Async operationsImproves throughputUse wrap_async() for concurrent tasks
Headless modeReduces resource usageDefault to headless=True
Selective waitsFaster executionUse specific wait conditions over timeouts

Source: examples/python/news-aggregator/main.py

Source: https://github.com/tinyfish-io/agentql / Human Manual

Data Collection Patterns

Related topics: Query Examples and Patterns, Integrations and Framework Connections

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python Implementation

Continue reading this section for the full explanation and source context.

Section JavaScript Implementation

Continue reading this section for the full explanation and source context.

Section Pagination Query Structure

Continue reading this section for the full explanation and source context.

Related topics: Query Examples and Patterns, Integrations and Framework Connections

Data Collection Patterns

AgentQL provides robust patterns for collecting structured data from websites. These patterns leverage the query language's natural language selectors and structured output capabilities to extract data reliably across different page layouts and UI changes.

Overview

Data collection in AgentQL revolves around extracting structured information from web pages using queries that define the expected data shape. The patterns demonstrated in the examples cover common scenarios including paginated data collection, multi-URL aggregation, and list extraction with transformations.

Pagination Patterns

Pagination patterns enable collecting data that spans multiple pages, a common requirement for e-commerce listings, news archives, and search results.

Python Implementation

The paginated data collection pattern uses a loop structure that:

  1. Navigates to the initial page
  2. Extracts data using query_data() with a structured query
  3. Detects pagination elements to proceed to the next page
  4. Continues until no more pages exist or a limit is reached
# Source: examples/python/collect_paginated_ecommerce_listing_data/main.py
from playwright.sync_api import sync_playwright
import agentql

URL = "https://scrapeme.live/shop"

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await agentql.wrap_async(browser.new_page())
        
        all_products = []
        current_page = 1
        max_pages = 5
        
        while current_page <= max_pages:
            await page.goto(f"{URL}/page/{current_page}/")
            
            # Query structured data from the page
            data = await page.query_data(PRODUCT_DATA_QUERY)
            all_products.extend(data.get("products", []))
            
            current_page += 1

JavaScript Implementation

// Source: examples/js/collect-paginated-ecommerce-data/main.js
const { chromium } = require('playwright');
const agentql = require('agentql');

(async () => {
  const browser = await chromium.launch({ headless: true });
  const page = await agentql.wrap(browser.newPage());
  
  let pageNum = 1;
  const allProducts = [];
  
  while (pageNum <= maxPages) {
    await page.goto(`${baseUrl}?page=${pageNum}`);
    const data = await page.queryData(PRODUCT_QUERY);
    allProducts.push(...data.products);
    pageNum++;
  }
})();

Pagination Query Structure

ElementQuery FieldPurpose
Product cardsproducts[]Array of product items on each page
Pagination controlnext_page_buttonElement to click for next page
Item countertotal_itemsTotal count displayed on page
Page indicatorcurrent_pageCurrent page number

Multi-URL Aggregation Patterns

Collecting data from multiple URLs simultaneously improves efficiency when you need to aggregate information from disparate sources.

Concurrent Tab Collection

The news aggregator example demonstrates opening multiple URLs in separate tabs within the same browser context:

# Source: examples/python/news-aggregator/main.py
WEBSITE_URLS = [
    "https://duckduckgo.com/?q=agents+for+the+web&t=h_&iar=news&ia=news",
    # Additional URLs...
]

async def main():
    async with async_playwright() as p, await p.chromium.launch(
        headless=True
    ) as browser, await browser.new_context() as context:
        # Open multiple tabs concurrently to fetch data
        await asyncio.gather(
            *(fetch_data(context, url) for url in WEBSITE_URLS)
        )

Data Flow Architecture

graph TD
    A[Start Browser Context] --> B[Create Multiple Tabs]
    B --> C[Concurrent URL Navigation]
    C --> D[Query Data per Page]
    D --> E[Transform & Clean Data]
    E --> F[Write to CSV/JSON]
    F --> G[Close Browser]

Handling Multi-Source Data

Each source may return data in different structures. The aggregator normalizes this using AgentQL queries that return consistent field names:

# Source: examples/python/news-aggregator/main.py
QUERY = """
{
    items[] {
        entry
        published_date
        url
        outlet
        author
    }
}
"""

List Extraction Patterns

Extracting lists of items requires defining array fields in the AgentQL query syntax using [] notation.

Basic List Query

# Source: examples/python/first_steps/main.py
PRODUCT_DATA_QUERY = """
{
    price_currency
    products[] {
        name
        price(integer)
    }
}
"""

The products[] notation defines an array of items, where each item contains name and price fields. The (integer) modifier transforms the price string to a numeric type.

Data Transformation During Extraction

AgentQL supports inline transformations within queries:

TransformSyntaxExample
Type conversion(type)price(integer), date(date)
String cleaning.strip()title.strip()
Array filtering[condition]items[count > 0]

Maps and Location Data Collection

The maps scraper examples demonstrate collecting geographic and location-based data:

Python Maps Scraper

# Source: examples/python/maps_scraper/main.py
LOCATION_QUERY = """
{
    business_name
    rating
    reviews_count
    address
    phone
    website
    category
}
"""

JavaScript Maps Scraper

// Source: examples/js/maps_scraper/main.js
const LOCATION_QUERY = `
{
    business_name
    rating
    reviews_count
    address
    phone
    website
    category
}
`;

Both implementations follow the same pattern:

  1. Navigate to the map service URL with search parameters
  2. Wait for results to load
  3. Execute the query to extract structured location data
  4. Store results in the desired format

Data Export Patterns

AgentQL examples demonstrate multiple export formats for collected data.

CSV Export

# Source: examples/python/news-aggregator/main.py
CSV_FILE_PATH = os.path.join(SCRIPT_DIR, "news_headlines.csv")

async def fetch_data(context: BrowserContext, session_url):
    page = await agentql.wrap_async(context.new_page())
    await page.goto(session_url)
    
    data = await page.query_data(QUERY)
    
    # Prepare new data with pipe-separated format
    new_lines = []
    for item in data["items"]:
        # Strip '|' from entry to avoid CSV formatting issues
        clean_entry = item["entry"].replace("|", "")
        new_lines.append(
            f"{item['published_date']} | {clean_entry} | {item['url']} | {item['outlet']} | {item['author']}\n"
        )

Data Cleaning During Export

IssueSolutionExample
CSV delimiter collisionStrip delimiter characters`item["entry"].replace("", "")`
Type inconsistencyApply transforms in queryprice(integer)
Missing fieldsProvide defaultsfield or "N/A"
WhitespaceTrim stringsfield.strip()

Error Handling Patterns

Resilient data collection requires proper error handling for network issues, page load failures, and query mismatches.

Try-Except Block Pattern

# Source: examples/python/collect_paginated_news_headlines/main.py
async def collect_headlines(page, query, max_pages=10):
    all_headlines = []
    
    for page_num in range(1, max_pages + 1):
        try:
            await page.goto(f"{BASE_URL}&page={page_num}")
            await page.wait_for_load_state("networkidle")
            
            data = await page.query_data(query)
            headlines = data.get("headlines", [])
            
            if not headlines:
                break  # No more data available
                
            all_headlines.extend(headlines)
            
        except Exception as e:
            print(f"Error on page {page_num}: {e}")
            continue
            
    return all_headlines

Resilience to UI Changes

AgentQL's natural language selectors provide resilience to UI changes. When page structure changes, queries using semantic descriptions continue to work, unlike CSS selectors that break when DOM structure changes.

Best Practices

Query Design

  • Use semantic field names: Match query field names to visible content, not DOM attributes
  • Define array fields explicitly: Use [] notation for lists of similar items
  • Apply transforms early: Use type conversions in queries rather than post-processing
  • Handle missing data: Design queries with optional fields using the ? modifier

Performance Optimization

TechniqueImplementation
Concurrent tab collectionUse asyncio.gather() for multiple URLs
Headless browsingSet headless=True for server environments
Context reuseReuse browser contexts to maintain session state
Pagination limitsSet maximum page counts to prevent infinite loops

Cross-Site Compatibility

The same AgentQL query can work across sites with similar content structure. For example, a product listing query designed for one e-commerce site may work on another with minimal modification due to the natural language selector approach.

Source: https://github.com/tinyfish-io/agentql / Human Manual

Integrations and Framework Connections

Related topics: REST API, Browser Modes and Configuration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python SDK Integration

Continue reading this section for the full explanation and source context.

Section JavaScript SDK Integration

Continue reading this section for the full explanation and source context.

Section SDK Dependencies

Continue reading this section for the full explanation and source context.

Related topics: REST API, Browser Modes and Configuration

Integrations and Framework Connections

AgentQL provides flexible integration options with various frameworks, automation tools, and deployment environments. This page covers the available SDKs, framework connections, authentication patterns, and deployment considerations.

Overview

AgentQL connects LLMs and AI agents to the web through its query language and Playwright integrations. The platform offers multiple integration pathways:

Integration TypeDescription
Python SDKRunning automation and scraping scripts with AgentQL queries in Python
JavaScript SDKRunning automation and scraping scripts with AgentQL queries in JavaScript
REST APIExecuting queries without an SDK
MCP ServerModel Context Protocol integration for AI agents
Framework IntegrationsLangchain, Zapier, and other automation tools

SDK Integration Architecture

AgentQL provides seamless integration with Playwright, the browser automation library. Both Python and JavaScript SDKs wrap Playwright's browser context to enable AgentQL querying capabilities.

Python SDK Integration

The Python SDK integrates with Playwright's sync and async APIs. The core integration pattern uses the agentql.wrap() function to extend Playwright page objects with AgentQL querying capabilities.

import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright

def main():
    with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
        page = agentql.wrap(browser.new_page())
        page.goto(URL)

Source: examples/python/first_steps/main.py:1-19

JavaScript SDK Integration

The JavaScript SDK follows a similar pattern, wrapping Playwright page objects to provide AgentQL querying methods.

const { chromium } = require('playwright');
const agentql = require('agentql');

(async () => {
  const browser = await chromium.launch();
  const page = await agentql.wrap(browser.newPage());
  await page.goto('https://example.com');
})();

Source: examples/js/log-into-sites/main.js:1-50

SDK Dependencies

SDKKey Dependencies
Python SDKplaywright, agentql
JavaScript SDKplaywright, playwright-dompath, openai, agentql

Source: examples/js/package.json:1-30

Authentication and Session Management

AgentQL supports authenticated web interactions through session persistence and browser context management.

Login Pattern

Authentication is achieved by performing login actions before executing AgentQL queries. The pattern involves navigating to the login page, performing credentials entry, and then executing queries within the authenticated session.

async def log_in(page):
    await page.goto(LOGIN_URL)
    await page.fill(USERNAME_SELECTOR, USERNAME)
    await page.fill(PASSWORD_SELECTOR, PASSWORD)
    await page.click(LOGIN_BUTTON)
    await page.wait_for_load_state("networkidle")

Source: examples/python/log_into_sites/main.py:1-60

Session Persistence

Authenticated sessions can be saved and restored using Playwright's storage state mechanism. This allows maintaining login state across script executions.

async def save_authenticated_session(context, storage_path):
    await context.storage_state(path=storage_path)

async def load_authenticated_session(browser, storage_path):
    context = await browser.new_context(storage_state=storage_path)
    return context

Source: examples/python/save_and_load_authenticated_session/main.py:1-80

Session Flow

graph TD
    A[Launch Browser] --> B{Check for Existing Session}
    B -->|Session Exists| C[Load Storage State]
    B -->|No Session| D[Create New Context]
    C --> E[Navigate to Target URL]
    D --> F[Login to Site]
    F --> E
    E --> G[Execute AgentQL Queries]
    G --> H[Optional: Save Session]

Framework Integrations

LangChain Integration

AgentQL integrates with LangChain for building agent workflows that interact with web pages. The integration allows LangChain agents to use natural language queries that translate to AgentQL queries.

Community Note: The LangChain integration enables AI agents to browse and extract data from websites using natural language instructions.

Zapier Integration

AgentQL provides Zapier integration for no-code automation workflows, enabling users to incorporate web data extraction into automated processes without writing code.

MCP Server

The Model Context Protocol (MCP) server integration allows AI agents to interact with web pages through a standardized protocol. This enables:

  • Remote browser control
  • Query execution via API
  • Integration with AI agent frameworks

External AI Service Integration

AgentQL can be combined with external AI services for advanced data processing, such as sentiment analysis on extracted content.

from openai import OpenAI

def perform_sentiment_analysis(comments):
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": SYSTEM_MESSAGE},
            {"role": "user", "content": USER_MESSAGE},
        ],
    )
    return completion.choices[0].message.content

Source: examples/python/perform_sentiment_analysis/main.py:1-50

Data Processing Pipeline

graph LR
    A[Web Page] -->|AgentQL Query| B[Extract Data]
    B --> C[Process with LLM]
    C -->|Sentiment| D[Analysis Results]
    C -->|Summary| E[Content Summary]

Cloudflare Browser Rendering Integration

Community Note: Issue #128 discusses using AgentQL with Cloudflare's Browser Rendering feature, which provides browser instances from Cloudflare Workers via Playwright.

The integration with Cloudflare Browser Rendering enables:

  • Edge-based browser automation
  • Scalable browser infrastructure
  • Serverless web scraping workflows

Edge Environment Considerations

When deploying AgentQL in edge environments like Cloudflare Workers:

  • Node.js APIs may have limitations
  • CDP (Chrome DevTools Protocol) connection handling differs from standard Node.js
  • Browser instance lifecycle management requires careful handling

Source: Issue #128: AgentQL JS x Cloudflare Browser Rendering

REST API Integration

For environments where SDK installation is not feasible, AgentQL provides a REST API for executing queries without an SDK.

Endpoint TypeUse Case
Query ExecutionExecute AgentQL queries via HTTP
Data ExtractionRetrieve structured data from web pages

Source: REST API Documentation

Integration Patterns

Concurrent Data Collection

AgentQL supports concurrent page interactions using async patterns:

async def main():
    async with async_playwright() as p, await p.chromium.launch(headless=True) as browser, await browser.new_context() as context:
        await asyncio.gather(
            *(fetch_data(context, url) for url in WEBSITE_URLS)
        )

Source: examples/python/news-aggregator/main.py:1-40

Pagination Handling

Integration with pagination enables data collection across multiple pages:

async def collect_paginated_data(page, query):
    all_items = []
    while True:
        data = await page.query_data(query)
        all_items.extend(data["items"])
        if not await page.locator("next_button").is_visible():
            break
        await page.click("next_button")
    return all_items

Multi-Tab Browser Context

For concurrent operations, AgentQL supports multiple tabs within a single browser context:

async def fetch_data(context, url):
    page = await agentql.wrap_async(context.new_page())
    await page.goto(url)
    data = await page.query_data(QUERY)
    return data

Configuration Options

Browser Launch Options

OptionTypeDescription
headlessbooleanRun browser without visible UI
argslistAdditional browser arguments
viewportdictBrowser viewport dimensions

Query Options

OptionDescription
timeoutMaximum wait time for query results
retry_countNumber of retry attempts on failure
strict_modeEnable strict element matching

Best Practices

Error Handling

  • Implement retry logic for network failures
  • Handle authentication session expiration gracefully
  • Use appropriate timeouts for slow-loading pages

Resource Management

  • Close browser contexts when operations complete
  • Use headless mode for production deployments
  • Reuse browser instances for multiple queries when possible

Security Considerations

  • Store credentials securely (environment variables, secrets management)
  • Implement session timeout policies
  • Validate SSL certificates for production use

Source: https://github.com/tinyfish-io/agentql / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 8 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_761b694cc0e94100b46ba5683041137b | https://github.com/tinyfish-io/agentql/issues/114

2. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_55a8aa1466634fb39e0b679f753270ec | https://github.com/tinyfish-io/agentql/issues/148

3. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | github_repo:760722197 | https://github.com/tinyfish-io/agentql

4. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | github_repo:760722197 | https://github.com/tinyfish-io/agentql

5. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | github_repo:760722197 | https://github.com/tinyfish-io/agentql

6. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | github_repo:760722197 | https://github.com/tinyfish-io/agentql

7. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | github_repo:760722197 | https://github.com/tinyfish-io/agentql

8. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | github_repo:760722197 | https://github.com/tinyfish-io/agentql

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 3

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using agentql with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence