Doramagic Project Pack · Human Manual
agentql
AgentQL addresses a fundamental challenge in web automation: traditional selectors (CSS, XPath) are brittle and break when web pages change. AgentQL uses natural language queries to locate...
Introduction to AgentQL
Related topics: Quick Start Guide, Python SDK
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Quick Start Guide, Python SDK
Introduction to AgentQL
AgentQL is an open-source framework that connects Large Language Models (LLMs) and AI agents to the web through a natural language query language. It enables developers to extract structured data, automate web interactions, and build web scraping solutions using intuitive queries that remain resilient to UI changes over time.
Overview
AgentQL addresses a fundamental challenge in web automation: traditional selectors (CSS, XPath) are brittle and break when web pages change. AgentQL uses natural language queries to locate elements and extract data, making automation scripts more maintainable and adaptable.
The framework integrates seamlessly with Playwright, supporting both Python and JavaScript environments. It works on any webpage—public sites, private pages, URLs behind authentication—regardless of the site's structure or technology.
Source: README.md
Core Features
| Feature | Description |
|---|---|
| Natural Language Selectors | Find elements and data using intuitive queries based on page content |
| Structured Output | Define data shapes within queries for consistent structured results |
| Cross-Site Compatibility | Use the same query across different sites with similar content |
| Transforms and Extracts | Apply data transformations directly within queries |
| Resilience to UI Changes | Queries self-heal as page structures evolve |
| Works on Any Page | Public, private, authenticated—any URL |
Source: README.md
Architecture
AgentQL follows a client-side wrapper pattern where the AgentQL SDK wraps Playwright's page objects to extend their functionality with query capabilities.
graph TD
A[Developer] -->|Writes AgentQL Query| B[AgentQL SDK]
B -->|Wraps| C[Playwright Page Object]
C -->|Interacts with| D[Web Page]
D -->|Returns DOM| C
C -->|Processes| B
B -->|Structured JSON| A
E[LLM Backend] <-->|Natural Language Processing| BQuery Methods
The SDK provides two primary API methods for interacting with web pages:
| Method | Purpose | Use Case |
|---|---|---|
query_elements() | Locate DOM elements | Automation, clicking, typing |
query_data() | Extract structured data | Scraping, data collection |
get_by_prompt() | Natural language element lookup | Finding elements by description |
Source: examples/python/first_steps/main.py:1-80
SDKs and Tools
AgentQL provides multiple entry points for different development environments:
Python SDK
The Python SDK integrates with Playwright's synchronous API for automation and scraping scripts.
import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright
with sync_playwright() as playwright:
page = agentql.wrap(browser.new_page())
response = page.query_elements(SEARCH_BOX_QUERY)
data = page.query_data(PRODUCT_DATA_QUERY)
Installation: pip install agentql Documentation: Python SDK Installation
Source: examples/python/first_steps/main.py:1-16
JavaScript SDK
The JavaScript SDK works with Playwright for Node.js environments.
import { chromium } from '@playwright/test';
import { wrap, query } from 'agentql';
async function main() {
const browser = await chromium.launch();
const page = await browser.newPage();
const wrappedPage = wrap(page);
// Use wrappedPage.query_elements() and wrappedPage.query_data()
}
Installation: Available via npm Documentation: JavaScript SDK Installation
Source: examples/js/first-steps/README.md
REST API
Execute AgentQL queries without installing an SDK via the REST API endpoint.
Documentation: REST API Reference
Source: README.md
Additional Tools
| Tool | Purpose |
|---|---|
| Debugger Chrome Extension | Debug and refine queries in real-time on live sites |
| Playground | Interactive environment to test queries and export scripts |
| AgentQL Query Language | Define queries with natural language syntax |
| MCP Server | Integration for agent frameworks |
| LangChain Integration | Connect with LangChain for agentic workflows |
Source: README.md
Query Language
AgentQL queries use a GraphQL-like syntax to define what elements to find and what data to extract.
Basic Element Query
{
search_product_box
submit_button
results_container
}
Source: examples/python/first_steps/main.py:23-30
Data Extraction Query
{
price_currency
products[] {
name
price(integer)
}
}
The [] notation queries lists of items, and type annotations like (integer) apply transformations to extracted values.
Source: examples/python/first_steps/main.py:32-39
Natural Language Prompt
For element location, you can use free-form natural language prompts:
NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)
This approach finds elements based on semantic understanding rather than structural selectors.
Source: examples/python/first_steps/main.py:42-47
Common Use Cases
Collecting List Data
Extract multiple items from a page, such as product listings or search results:
PRODUCT_DATA_QUERY = """
{
products[] {
name
price
link
}
}
"""
data = page.query_data(PRODUCT_DATA_QUERY)
Source: examples/python/list_query_usage/README.md
Handling Pagination
Step through multiple pages to collect large datasets:
// Collect HackerNews headlines across paginated pages
async function collectHeadlines(url, numPages) {
const headlines = [];
for (let i = 0; i < numPages; i++) {
await page.goto(url + `?p=${i + 1}`);
const data = await page.queryData(HEADLINES_QUERY);
headlines.push(...data.headlines);
}
return headlines;
}
Source: examples/js/collect-paginated-news-headlines/README.md
Form Automation
Fill out and submit forms using natural language queries:
const FORM_QUERY = `
{
username_field
password_field
submit_button
}
`;
const form = await page.queryElements(FORM_QUERY);
await form.username_field.fill('[email protected]');
await form.submit_button.click();
Source: examples/js/submit-form/README.md
E-commerce Data Collection
Extract pricing and product information from online stores:
PRODUCT_DATA_QUERY = """
{
price_currency
products[] {
name
price(integer)
}
}
"""
response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type(search_key_word, delay=200)
page.keyboard.press("Enter")
data = page.query_data(PRODUCT_DATA_QUERY)
Source: examples/python/first_steps/main.py:31-60
Waiting for Page Load
Ensure pages fully load before querying:
await page.goto(url);
// Wait for network idle and dynamic content
await page.waitForLoadState('networkidle');
const data = await page.queryData(DATA_QUERY);
Source: examples/js/wait-for-entire-page-load/README.md
Integration Patterns
With AI Agents
AgentQL is designed for AI agent workflows. The framework allows agents to:
- Navigate to any URL
- Query elements using natural language
- Extract structured data
- Perform actions (click, type, scroll)
graph LR
A[AI Agent] -->|Instruction| B[AgentQL SDK]
B -->|Query| C[Web Page]
C -->|Data| D[Structured Output]
D -->|Analysis| A
A -->|Action| BSource: README.md
Cloudflare Workers Consideration
Users have explored using AgentQL with Cloudflare's Browser Rendering for edge environments. However, edge environments may have limitations with certain Node.js APIs that AgentQL depends on. See Issue #128 for community discussion on this integration pattern.
Source: Community Issue #128
Getting Started
Prerequisites
- Python 3.8+ or Node.js 18+
- Playwright installed
Installation
Python:
pip install agentql
playwright install chromium
JavaScript:
npm install agentql
npx playwright install chromium
Quick Start Steps
- Install the AgentQL SDK for your language
- Launch a browser with Playwright
- Wrap the page object with
agentql.wrap() - Write your first AgentQL query
- Use
query_elements()for actions orquery_data()for extraction - Optional: Install the AgentQL Debugger Chrome Extension to test queries on live sites
Testing Your Queries
The AgentQL Playground at playground.agentql.com allows you to:
- Test queries on live websites
- Export working Python/JavaScript scripts
- Optimize query patterns
Source: README.md
Community Resources
| Resource | Link |
|---|---|
| Documentation | docs.agentql.com |
| Discord Community | discord.gg/agentql |
| X (Twitter) | @agentql |
| tinyfish-ai | |
| Deep-dive Article | Starlog Analysis |
Known Limitations
- Element resolution may occasionally return generic containers instead of specific elements (see Issue #121)
- Edge environment compatibility requires additional configuration for Cloudflare Workers (Issue #128)
Summary
AgentQL bridges the gap between LLMs and web automation by providing a natural language query interface that abstracts away brittle CSS/XPath selectors. Its dual Python and JavaScript SDKs integrate with Playwright, making it accessible for both backend automation scripts and modern web agent frameworks. The structured output capability, combined with transforms and cross-site compatibility, makes AgentQL a robust choice for building maintainable web scraping and automation solutions.
Source: https://github.com/tinyfish-io/agentql / Human Manual
Quick Start Guide
Related topics: Python SDK, JavaScript SDK
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Python SDK, JavaScript SDK
Quick Start Guide
AgentQL is a query language and SDK designed to connect LLMs and AI agents to the web. This guide provides everything you need to start using AgentQL within 5 minutes, whether you're using Python or JavaScript.
Prerequisites
Before beginning, ensure you have the following installed:
| Requirement | Version | Purpose |
|---|---|---|
| Python | 3.8+ | For Python SDK usage |
| Node.js | 18+ | For JavaScript SDK usage |
| Playwright | Latest | Browser automation |
| AgentQL SDK | Latest | Core library |
Python SDK Installation
Install the AgentQL Python SDK using pip:
pip install agentql
Install Playwright with the required browsers:
pip install playwright
playwright install chromium
JavaScript SDK Installation
Install the AgentQL JavaScript SDK using npm:
npm install agentql
npx playwright install chromium
Core Concepts
Understanding these fundamental concepts will help you write effective AgentQL queries:
AgentQL Query Language
AgentQL uses a JSON-like query syntax to describe what data to extract or what elements to interact with on a web page. Queries are written in natural language-like format, making them intuitive and self-documenting.
{
search_product_box
products[] {
name
price(integer)
}
}
Source: examples/python/first_steps/main.py:29-36
Smart Locator vs Data Query API
AgentQL provides two distinct APIs:
| API Type | Method | Purpose |
|---|---|---|
| Smart Locator | query_elements() | Locate elements for interaction |
| Data Query | query_data() | Extract structured data |
Your First Script
Python Quick Start
Create a new file named main.py and add the following code:
#!/usr/bin/env python3
import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright
URL = "https://scrapeme.live/shop"
# Query to locate the search box element
SEARCH_BOX_QUERY = """
{
search_product_box
}
"""
# Query for data extraction
PRODUCT_DATA_QUERY = """
{
price_currency
products[] {
name
price(integer)
}
}
"""
def main():
with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
page = agentql.wrap(browser.new_page())
page.goto(URL)
product_data = page.query_data(PRODUCT_DATA_QUERY)
print(product_data)
if __name__ == "__main__":
main()
Source: examples/python/first_steps/main.py:1-45
Run the script:
python3 main.py
JavaScript Quick Start
Create a new file named main.js:
const agentql = require('agentql');
const { chromium } = require('playwright');
const URL = "https://scrapeme.live/shop";
const PRODUCT_QUERY = `
{
price_currency
products[] {
name
price(integer)
}
}
`;
async function main() {
const browser = await chromium.launch({ headless: false });
const page = await agentql.wrapAsync(browser.newPage());
await page.goto(URL);
const productData = await page.queryData(PRODUCT_QUERY);
console.log(productData);
await browser.close();
}
main();
Source: examples/js/collect-paginated-news-headlines/README.md:18-36
Run the script:
node main.js
Workflow Overview
graph TD
A[Install AgentQL SDK] --> B[Import AgentQL Library]
B --> C[Launch Browser with Playwright]
C --> D[Wrap Page with AgentQL]
D --> E[Write AgentQL Query]
E --> F[Execute Query]
F --> G[Process Results]
G --> H[Close Browser]Common Usage Patterns
Extracting Paginated Data
To collect data across multiple pages, use a loop with navigation:
import agentql
from playwright.sync_api import sync_playwright
async def collect_paginated_news():
async with sync_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await agentql.wrap_async(browser.new_page())
all_items = []
for page_num in range(3): # Collect 3 pages
await page.goto(f"https://news.ycombinator.com?p={page_num + 1}")
data = await page.query_data(QUERY)
all_items.extend(data.get("items", []))
await browser.close()
return all_items
Source: examples/python/collect_paginated_news_headlines/README.md:1-22
Multi-URL Data Collection
Fetch data from multiple websites concurrently using async patterns:
import asyncio
import agentql
from agentql.ext.playwright.async_api import Page
from playwright.async_api import async_playwright
WEBSITE_URLS = [
"https://duckduckgo.com/?q=agents+for+the+web&t=h_&iar=news&ia=news",
]
async def main():
async with async_playwright() as p:
async with await p.chromium.launch(headless=True) as browser:
async with await browser.new_context() as context:
await asyncio.gather(
*(fetch_data(context, url) for url in WEBSITE_URLS)
)
async def fetch_data(context, session_url):
page = await agentql.wrap_async(await context.new_page())
await page.goto(session_url)
data = await page.query_data(QUERY)
return data
Source: examples/python/news-aggregator/main.py:17-36
Synchronous vs Asynchronous Execution
AgentQL supports both synchronous and asynchronous patterns:
| Pattern | Use Case | API |
|---|---|---|
| Synchronous | Simple scripts, sequential operations | agentql.wrap() |
| Asynchronous | Concurrent operations, better performance | agentql.wrap_async() |
Synchronous example:
from playwright.sync_api import sync_playwright
def main():
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = agentql.wrap(browser.new_page())
page.goto(URL)
data = page.query_data(QUERY)
browser.close()
Source: examples/python/news-aggregator/main_sync.py:17-27
Running Examples in Google Colab
You can run AgentQL examples directly in Google Colab without local installation:
- Navigate to the Google Colab example
- Open
main.ipynbin Colab - Run cells sequentially
This approach is useful for quick experimentation without setting up a local environment.
Writing Effective Queries
Querying Lists
Use array syntax [] to query multiple elements:
{
products[] {
name
price
description
}
}
Data Type Transformations
Apply type conversions within queries:
{
products[] {
name
price(integer) # Convert to integer
rating(float) # Convert to float
}
}
Source: examples/python/first_steps/main.py:34-36
Natural Language Prompts
For element location, use natural language prompts:
NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
This allows flexible element selection based on descriptive intent rather than CSS selectors.
Troubleshooting Common Issues
Element Resolution Problems
If elements resolve as "useless span" or fail to locate expected elements:
- Verify the URL matches the expected page structure
- Use the AgentQL Debugger Chrome extension to test queries
- Check that the page has fully loaded before querying
Source: issues/tinyfish-io/agentql#121
Cloudflare Browser Rendering
When using AgentQL with Cloudflare's Browser Rendering:
- Edge environments may have Node.js API limitations
- Some synchronous Playwright APIs may not be available
- Consider using async patterns for edge compatibility
Source: issues/tinyfish-io/agentql#128
Next Steps
After completing this quick start guide:
| Resource | Description |
|---|---|
| AgentQL Query Language | Deep dive into query syntax |
| Python SDK Reference | Complete API documentation |
| JavaScript SDK Reference | JS API documentation |
| Examples Repository | Full example collection |
| Discord Community | Get help and share feedback |
Key Takeaways
- Installation is straightforward - A single package install gets you started
- Two API modes - Choose sync for simplicity or async for performance
- Natural language queries - Write queries that describe intent, not selectors
- Structured output - Data returns in the shape you define in your query
- Cross-site compatibility - Queries work across similar sites with comparable content
Get started in 5 minutes by running the example scripts above, then explore the official documentation for advanced features and integrations.
Source: https://github.com/tinyfish-io/agentql / Human Manual
Python SDK
Related topics: JavaScript SDK, Browser Modes and Configuration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: JavaScript SDK, Browser Modes and Configuration
Python SDK
The AgentQL Python SDK provides a powerful interface for connecting LLMs and AI agents to the web through structured data queries and intelligent element location. Built as a wrapper around Microsoft Playwright, the SDK enables developers to extract structured data, interact with web elements, and automate browser workflows using AgentQL's query language and natural language prompts.
Overview
The Python SDK serves as the primary programming interface for Python developers building web automation, data extraction, and AI agent applications. It wraps Playwright's Page objects to provide AgentQL-specific querying capabilities while maintaining full access to Playwright's browser automation features.
Key Capabilities
| Capability | Description |
|---|---|
| Structured Data Extraction | Query web pages using AgentQL's query language to extract typed, structured data |
| Natural Language Element Selection | Locate elements using intuitive prompts instead of CSS selectors |
| Cross-Site Compatibility | Write queries once and use them across similar websites |
| Dual API Support | Available in both synchronous and asynchronous implementations |
| Playwright Integration | Full access to Playwright's browser automation features |
Source: README.md:1-15
Installation
Prerequisites
- Python 3.12 or later (Python 3.13 recommended)
- Playwright browser binaries installed
Installation via pip
pip install agentql
Browser Binary Setup
After installing the SDK, initialize Playwright browsers:
playwright install chromium
The SDK is tested and recommended with Python 3.13 running on Debian 12 (Bookworm) slim base image, with Playwright v1.58.2 on Ubuntu 24.04 LTS.
Source: golden-images.yaml:1-30
Core API Methods
Wrapping a Page Object
To access AgentQL's querying capabilities, wrap a Playwright page object using agentql.wrap():
import agentql
from playwright.sync_api import sync_playwright
with sync_playwright() as playwright:
browser = playwright.chromium.launch(headless=True)
page = agentql.wrap(browser.new_page())
Source: examples/python/first_steps/main.py:35-39
query_data()
Extracts structured data from the page using an AgentQL query. Returns a dictionary matching the query structure.
PRODUCT_DATA_QUERY = """
{
price_currency
products[] {
name
price(integer)
}
}
"""
data = page.query_data(PRODUCT_DATA_QUERY)
print(data)
Parameters:
| Parameter | Type | Description |
|---|---|---|
| query | str | AgentQL query defining the data structure to extract |
| timeout | int | Maximum wait time in milliseconds (default: 30000) |
Returns: Dictionary with keys matching the query fields
Source: examples/python/first_steps/main.py:30-34
query_elements()
Locates DOM elements matching an AgentQL query, returning element references that can be interacted with using Playwright's API.
SEARCH_BOX_QUERY = """
{
search_product_box
}
"""
response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type("fish", delay=200)
page.keyboard.press("Enter")
Parameters:
| Parameter | Type | Description |
|---|---|---|
| query | str | AgentQL query defining elements to locate |
| timeout | int | Maximum wait time in milliseconds (default: 30000) |
Returns: Object with attributes matching query field names, containing Playwright Locator objects
Source: examples/python/first_steps/main.py:52-59
get_by_prompt()
Locates elements using natural language prompts. This method uses AI to find elements based on their semantic meaning rather than DOM structure.
# Locate the search bar using natural language
search_bar = page.get_by_prompt("the search bar")
search_bar.fill("AgentQL")
# Click a button using a description
page.get_by_prompt("the search button").click()
Parameters:
| Parameter | Type | Description |
|---|---|---|
| prompt | str | Natural language description of the element |
| timeout | int | Maximum wait time in milliseconds (default: 30000) |
Returns: Playwright Locator object for the matched element, or None if not found
Source: examples/python/get_by_prompt/main.py:18-26
Asynchronous API
For applications requiring concurrent operations, use the async API with async_playwright and agentql.wrap_async():
import asyncio
import agentql
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
async with await browser.new_context() as context:
page = await agentql.wrap_async(context.new_page())
await page.goto("https://example.com")
data = await page.query_data(QUERY)
Source: examples/python/news-aggregator/main.py:28-38
Concurrent Page Operations
The async API enables concurrent data fetching from multiple pages:
async def main():
async with async_playwright() as p, await p.chromium.launch(headless=True) as browser:
async with await browser.new_context() as context:
await asyncio.gather(
*(fetch_data(context, url) for url in WEBSITE_URLS)
)
async def fetch_data(context, url):
page = await agentql.wrap_async(context.new_page())
await page.goto(url)
data = await page.query_data(QUERY)
Source: examples/python/news-aggregator/main.py:28-44
Common Usage Patterns
E-commerce Data Extraction
Extract product information from e-commerce websites:
QUERY = """
{
products[]
{
name
price(integer)
}
}
"""
page.goto("https://scrapeme.live/shop")
response = page.query_data(QUERY)
# Write to CSV
with open("product_data.csv", "w", encoding="utf-8") as file:
file.write("Name, Price\n")
for product in response["products"]:
file.write(f"{product['name']},{product['price']}\n")
Source: examples/python/list_query_usage/main.py:14-30
Multi-Site Price Comparison
Compare product prices across different websites using the same query:
PRODUCT_INFO_QUERY = """
{
nintendo_switch_price
}
"""
page.goto(NINTENDO_URL)
response = page.query_data(PRODUCT_INFO_QUERY)
print("Price at Nintendo: ", response["nintendo_switch_price"])
page.goto(TARGET_URL)
response = page.query_data(PRODUCT_INFO_QUERY)
print("Price at Target: ", response["nintendo_switch_price"])
Source: examples/python/compare_product_prices/main.py:20-31
List Data Extraction
Query lists of items on a page:
QUERY = """
{
listings[]
{
name
rating
description
order_link
take_out_link
address
hours
}
}
"""
response = page.query_data(QUERY)
for listing in response["listings"]:
file.write(
f"{listing['name']},{listing['rating']},{listing['description']}...\n"
)
Source: examples/python/maps_scraper/main.py:1-15
Paginated Data Collection
Collect data across multiple pages by navigating through pagination:
for page_num in range(num_pages):
page.goto(f"{BASE_URL}&page={page_num}")
data = page.query_data(QUERY)
all_results.extend(data["items"])
Source: examples/python/collect_paginated_news_headlines/README.md:1-20
Data Transformations
AgentQL queries support inline transformations to format extracted data:
QUERY = """
{
items[]{
published_date(convert to XX/XX/XXXX format)
entry(title or post if no title is available)
author(person's name; return "n/a" if not available)
outlet(the original platform it is posted on)
url
}
}
"""
The SDK supports:
- Type conversions (e.g.,
price(integer)) - Date format transformations
- Default values for missing fields
- Conditional extraction logic
Source: examples/python/news-aggregator/main_sync.py:20-28
Advanced Configuration
Headless Mode
Run browsers in headless mode for server-side or CI environments:
with sync_playwright() as playwright:
playwright.chromium.launch(headless=True) # Default for CI/CD
For debugging, disable headless mode:
with sync_playwright() as playwright:
playwright.chromium.launch(headless=False) # Visible browser
Browser Contexts
Use browser contexts to isolate sessions, cookies, and state:
async with await browser.new_context() as context:
# Each context has independent storage
page = await agentql.wrap_async(context.new_page())
Logging Configuration
Configure logging for debugging and monitoring:
import logging
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)
log.info("All done! CSV is here: %s", CSV_FILE_PATH)
Source: examples/python/news-aggregator/main.py:14-15
Architecture
graph TD
A[Python Application] --> B[AgentQL SDK]
B --> C[Playwright API]
C --> D[Browser Instance]
E[AgentQL Query Language] --> B
F[Natural Language Prompts] --> B
G[Web Page DOM] --> D
D --> H[Structured Data Response]
B --> H
subgraph "AgentQL SDK Components"
B
I[query_data method]
J[query_elements method]
K[get_by_prompt method]
end
I --> B
J --> B
K --> BRelationship to JavaScript SDK
The Python SDK shares identical API patterns with the JavaScript SDK, enabling cross-language development:
| Feature | Python SDK | JavaScript SDK |
|---|---|---|
| Wrap Page | agentql.wrap(page) | agentql.wrap(page) |
| Async Wrap | agentql.wrap_async(page) | agentql.wrapAsync(page) |
| Query Data | page.query_data(QUERY) | page.queryData(QUERY) |
| Query Elements | page.query_elements(QUERY) | page.queryElements(QUERY) |
| By Prompt | page.get_by_prompt("text") | page.getByPrompt("text") |
Both SDKs use the same AgentQL query language and provide equivalent functionality for their respective platforms.
See Also
- JavaScript SDK - For Node.js and browser environments
- REST API - Serverless query execution
- AgentQL Query Language - Query syntax reference
- Chrome Extension - Debug and develop queries interactively
- Examples Repository - Complete working examples
Source: https://github.com/tinyfish-io/agentql / Human Manual
JavaScript SDK
Related topics: Python SDK, REST API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Python SDK, REST API
JavaScript SDK
The AgentQL JavaScript SDK enables developers to build web automation and scraping applications using natural language queries. It provides a seamless integration with Playwright, allowing JavaScript and Node.js developers to leverage AgentQL's query language for extracting structured data from web pages.
Overview
The JavaScript SDK wraps Playwright's browser automation capabilities with AgentQL's intelligent querying layer. This combination allows developers to:
- Query web pages using natural language descriptions
- Extract structured data without relying on CSS selectors or XPath
- Build resilient automation scripts that adapt to UI changes
- Execute queries across multiple browser contexts simultaneously
Source: examples/js/package.json:1-28
SDK Dependencies
| Dependency | Version | Purpose |
|---|---|---|
| agentql | latest | Core SDK package |
| playwright | ^1.48.2 | Browser automation framework |
| playwright-dompath | ^0.0.7 | DOM path resolution |
| openai | ^4.70.1 | LLM integration for query processing |
Source: examples/js/package.json:18-22
Installation
Prerequisites
- Node.js environment
- Playwright browsers installed
Setup
const { wrap, configure } = require('agentql');
const { chromium } = require('playwright');
Configuration
Configure the SDK with your API key:
configure({
apiKey: process.env.AGENTQL_API_KEY, // Optional, uses default if omitted
});
Source: examples/js/get-by-prompt/main.js:10-12
Core API
Wrapping a Playwright Page
The wrap() function transforms a standard Playwright Page object into an AgentQL-enabled page that supports natural language queries:
const { wrap } = require('agentql');
const { chromium } = require('playwright');
async function main() {
const browser = await chromium.launch({ headless: false });
const page = await wrap(await browser.newPage());
await page.goto('https://example.com');
// Now page has AgentQL query capabilities
}
Source: examples/js/get-by-prompt/main.js:14-17
getByPrompt Method
The getByPrompt() method locates elements using natural language descriptions. This is the primary way to interact with page elements:
// Locate a sign up button by describing what it does
const signUpBtn = await page.getByPrompt('Sign up button');
// Click the element if found
if (signUpBtn) {
await signUpBtn.click();
}
Source: examples/js/get-by-prompt/main.js:24-30
queryData Method
The queryData() method extracts structured data from the page using AgentQL's query language:
const query = `
{
products[] {
name
model
sku
price(integer)
}
}
`;
const data = await page.queryData(QUERY);
console.log(data.products);
Source: examples/js/collect-pricing-data/main.js:12-23
AgentQL Query Language
The query language uses a GraphQL-like syntax to define the structure of desired data. Queries are processed by LLMs to find matching elements on the page.
Basic Query Structure
const query = `
{
items[]
{
published_date
entry
author
outlet
url
}
}
`;
Source: examples/js/news-aggregator/main.js:10-18
List Extraction
Use the [] notation to query arrays of items:
const query = `
{
products[] {
name
price
}
}
`;
Source: examples/js/collect-pricing-data/main.js:12-18
Data Transforms
Apply transforms within queries to modify extracted values:
const query = `
{
items[] {
published_date(convert to XX/XX/XXXX format)
entry(title or post if no title is available)
}
}
`;
Source: examples/js/news-aggregator/main.js:10-15
Type Conversions
Specify data types for extracted values:
const query = `
{
products[] {
name
price(integer)
}
}
`;
Source: examples/js/collect-pricing-data/main.js:15-17
Fallback Values
Handle missing data gracefully:
const query = `
{
items[] {
author(person's name; return "n/a" if not available)
outlet(the original platform it is posted on; if no platform is listed, use the root domain of the url)
}
}
`;
Source: examples/js/news-aggregator/main.js:15-18
Common Use Cases
Searching and Filtering
async function searchProduct(page, product, minPrice, maxPrice) {
// Find search input using natural language
const searchInput = await page.getByPrompt('the search input field');
if (!searchInput) {
console.log('Search input field not found.');
return false;
}
// Type with realistic delay
await searchInput.type(product, { delay: 200 });
await searchInput.press('Enter');
// Fill price range filters
const minPriceInput = await page.getByPrompt('the min price input field');
if (minPriceInput) {
await minPriceInput.fill(String(minPrice));
}
const maxPriceInput = await page.getByPrompt('the max price input field');
if (maxPriceInput) {
await maxPriceInput.fill(String(maxPrice));
await maxPriceInput.press('Enter');
}
return true;
}
Source: examples/js/collect-pricing-data/main.js:27-49
Pagination Handling
async function goToTheNextPage(page) {
const nextPageQuery = `
{
pagination {
prev_page
next_page
}
}
`;
// Query and interact with pagination controls
}
Source: examples/js/collect-pricing-data/main.js:53-63
Multi-Tab Data Collection
const websiteUrls = [
'https://bsky.app/search?q=agents+for+the+web',
'https://dev.to/search?q=agents%20for+the+web',
'https://hn.algolia.com/?query=agents%20for+the+web',
];
async function fetchData(context, sessionUrl) {
const page = await wrap(await context.newPage());
await page.goto(sessionUrl);
const data = await page.queryData(query);
// Process extracted data
}
// Fetch from multiple URLs concurrently
await asyncio.gather(
*(fetchData(context, url) for url in websiteUrls)
);
Source: examples/js/news-aggregator/main.js:26-41
Workflow Diagram
graph TD
A[Initialize Browser] --> B[Wrap Page with AgentQL]
B --> C[Configure API Key]
C --> D[Navigate to URL]
D --> E[Execute Query or getByPrompt]
E --> F{Query Type?}
F -->|Data Extraction| G[queryData returns structured JSON]
F -->|Element Interaction| H[getByPrompt returns element]
G --> I[Process Results]
H --> J[Interact with Element]
J --> K[Wait for Navigation/Update]
K --> E
I --> L[Close Browser]Configuration Options
Browser Launch Options
const browser = await chromium.launch({
headless: false // or true for headless mode
});
Source: examples/js/get-by-prompt/main.js:15
Browser Context Options
const context = await browser.newContext();
// Create multiple pages within the same context for concurrent operations
const page1 = await context.newPage();
const page2 = await context.newPage();
Source: examples/js/news-aggregator/main.js:27-31
Development Tools
Linting and Formatting
The SDK project includes pre-configured linting and formatting:
# Run ESLint
npm run lint
# Run Prettier
npm run format
Source: examples/js/package.json:7-10
Available Dev Dependencies
| Package | Version | Purpose |
|---|---|---|
| eslint | ^8.57.0 | JavaScript linting |
| eslint-config-prettier | ^9.1.0 | Disables ESLint rules that conflict with Prettier |
| prettier | ^2.8.7 | Code formatting |
| @trivago/prettier-plugin-sort-imports | ^4.3.0 | Import sorting |
Source: examples/js/package.json:11-15
Security Overrides
The SDK includes dependency version overrides for security patches:
"overrides": {
"axios": "^1.15.0",
"flatted": "^3.4.2",
"follow-redirects": "^1.16.0",
"lodash": "^4.18.0",
"minimatch": "^3.1.3"
}
Source: examples/js/package.json:23-28
Known Limitations
Cloudflare Browser Rendering Compatibility
There is an open issue regarding compatibility with Cloudflare's Browser Rendering in edge environments. Cloudflare Workers use a restricted Node.js runtime that may not fully support all Playwright and AgentQL features. Developers targeting Cloudflare Workers should be aware of potential limitations with browser instance access.
Source: GitHub Issue #128
Element Resolution Edge Cases
In some cases, elements may be resolved as generic containers (e.g., <span>) rather than semantic elements. This can affect element location accuracy. When encountering such issues, try using more specific prompt descriptions or combining with Playwright's native locators.
Source: GitHub Issue #121
Additional Resources
| Resource | Description |
|---|---|
| Installation Guide | Full SDK installation instructions |
| Query Language Docs | Complete AgentQL query language reference |
| Chrome Extension | Debug and test queries in real-time |
| Playground | Interactive query testing environment |
| Examples Directory | Complete list of JavaScript examples |
Source: https://github.com/tinyfish-io/agentql / Human Manual
REST API
Related topics: Python SDK, JavaScript SDK
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Python SDK, JavaScript SDK
REST API
AgentQL provides a REST API as an alternative to the Python and JavaScript SDKs for executing queries without requiring a full SDK installation. The REST API enables developers to interact with the AgentQL query engine over HTTP, making it suitable for environments where SDK integration is not practical or for quick prototyping and testing.
Overview
The REST API is one of three tool options provided by AgentQL alongside the Python SDK and JavaScript SDK. It allows executing queries against web pages without needing to set up Playwright or maintain a browser automation environment locally.
Source: README.md
Architecture
graph TD
A[Client Application] -->|HTTP POST /query| B[AgentQL REST API]
B -->|Parse & Process Query| C[Query Engine]
C -->|DOM Analysis| D[Web Page Content]
D -->|Extracted Data| B
B -->|JSON Response| A
E[SDK Client] -->|Internal Request| B
B -->|Same Flow| DWhen to Use the REST API
| Use Case | Recommended Tool | Notes |
|---|---|---|
| Server-side scraping with Python | Python SDK | Full Playwright integration |
| Browser automation in Node.js | JavaScript SDK | Native async support |
| Quick testing/prototyping | REST API | No SDK installation required |
| Edge environments | REST API | Lightweight HTTP requests only |
| External integrations | REST API | Language-agnostic interface |
Core Capabilities
Query Execution
The REST API supports the same AgentQL query language available in the SDKs. Queries can extract structured data from web pages using natural language selectors and path-based element queries.
Example query structure:
{
"query": "items[] { title, price, url }",
"url": "https://example.com/products"
}
Data Extraction
The API returns structured JSON data matching the shape defined in the query. Lists, nested objects, and type conversions are supported.
SDK vs REST API Comparison
| Feature | Python SDK | JavaScript SDK | REST API |
|---|---|---|---|
| Browser Automation | Yes | Yes | No |
| Query Execution | Yes | Yes | Yes |
| Installation Required | Yes | Yes | No |
| Authentication Support | Via SDK | Via SDK | Via API Key |
| Real-time Interaction | Yes | Yes | No |
| Pagination Handling | Manual | Manual | Manual |
| Rate Limiting | Client-side | Client-side | Server-enforced |
Source: README.md
Configuration Options
When using the REST API, authentication and request configuration are handled through HTTP headers:
| Parameter | Description | Required |
|---|---|---|
Authorization | API key for authentication | Yes |
Content-Type | Request payload format (application/json) | Yes |
Accept | Response format (application/json) | Yes |
SDK Dependencies and Requirements
For SDK implementations that internally may use REST endpoints, the following dependencies are relevant:
JavaScript SDK
Source: examples/js/package.json
{
"dependencies": {
"agentql": "latest",
"playwright": "^1.48.2",
"playwright-dompath": "^0.0.7"
}
}
Python SDK
The Python SDK uses Playwright as its underlying browser automation framework and communicates with the AgentQL query service.
Source: examples/python/news-aggregator/main.py
from playwright.async_api import async_playwright
import agentql
Common Usage Patterns
Structured Data Extraction
Both SDK and REST API approaches support extracting structured lists from pages:
Source: examples/python/list_query_usage/main.py
QUERY = """
{
products[]
{
name
price(integer)
}
}
"""
Multi-Source Aggregation
The REST API can be called from multiple sources to aggregate data:
Source: examples/python/news-aggregator/main.py
WEBSITE_URLS = [
"https://bsky.app/search?q=agents+for+the+web",
"https://dev.to/search?q=agents%20for%20the+web",
"https://hn.algolia.com/?dateRange=last24h&query=agents%20for%20the%20web",
]
Authentication and Security
The REST API uses API key authentication. Keys should be passed in the Authorization header:
curl -X POST https://api.agentql.com/v1/query \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "{ title }", "url": "https://example.com"}'
Limitations and Considerations
Edge Environment Compatibility
The REST API is particularly useful in edge environments where full SDK installation is not possible. However, issues have been reported when combining JavaScript SDK with Cloudflare's Browser Rendering feature, as some Node.js APIs may not be available in edge runtime environments.
Source: Issue #128: AgentQL (JS) x Cloudflare's Browser Rendering
Element Resolution
When using queries that resolve elements, some elements may be resolved as generic containers (like <span>) rather than the expected semantic elements. This can affect data extraction accuracy.
Source: Issue #121: querying element resolved as useless span
Documentation Links
When referencing examples or tutorials, ensure you use the correct documentation paths. Some older links may point to incorrect directories.
Source: Issue #64: Invalid Link | Documentation > Examples > Collab
Integration with Agent Frameworks
The REST API can be integrated with various agent frameworks as a lightweight alternative to SDK-based approaches. External services like run.pay have expressed interest in using AgentQL for autonomous AI agents to perform web interactions.
Source: Issue #153: Monetize AgentQL with run.pay
See Also
Source: https://github.com/tinyfish-io/agentql / Human Manual
AgentQL Query Language
Related topics: Query Examples and Patterns
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Query Examples and Patterns
AgentQL Query Language
Overview
The AgentQL Query Language is a domain-specific query language designed to extract structured data and locate DOM elements on web pages using natural language descriptions. It serves as the core abstraction layer that enables AI agents and LLMs to interact with web content in a robust, maintainable way that survives UI changes.
AgentQL queries are declarative, resembling a subset of GraphQL syntax, and support both element location and data extraction within a single unified syntax. Source: README.md:1-10
Core Concepts
Query Types
AgentQL distinguishes between two primary query operations:
| Query Type | Purpose | SDK Method | Returns |
|---|---|---|---|
| Element Query | Locate DOM elements for interaction | query_elements() | Playwright Locator objects |
| Data Query | Extract structured data from the page | query_data() | Dictionary/object with extracted values |
Source: examples/python/first_steps/main.py:35-55
Natural Language Selectors
Unlike traditional CSS selectors or XPath, AgentQL uses natural language to describe what elements or data to find. This approach provides:
- Intuitive element discovery — Describe elements by their purpose or content rather than markup structure
- Cross-site compatibility — The same query can work across different websites with similar content
- Self-healing resilience — When UI structure changes, natural language queries adapt automatically
Source: README.md:8-15
Query Syntax Reference
Basic Structure
Queries are defined as multi-line strings using a GraphQL-like syntax:
{
element_name
}
Source: examples/python/first_steps/main.py:22-25
Object and Field Selection
Nested objects are queried using brace notation. Fields within objects return their text content or attribute values:
{
price_currency
products[] {
name
price
}
}
Source: examples/python/first_steps/main.py:28-35
Array Syntax
The [] suffix denotes arrays/lists of items. This syntax extracts multiple items matching the query pattern:
{
products[] {
name
price
}
}
Source: examples/python/list_query_usage/README.md:1-15
Transforms
Transforms are applied inline to convert extracted data to specific types or formats. The transform name follows the field in parentheses:
{
products[] {
name
price(integer)
}
}
In this example, price(integer) instructs AgentQL to extract the price text and convert it to an integer. Source: examples/python/first_steps/main.py:33
Natural Language Prompts
For element location, you can use free-form natural language descriptions via the get_by_prompt() method:
NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)
Source: examples/python/first_steps/main.py:37-39
Usage Patterns
Python SDK Pattern
import agentql
from agentql.ext.playwright.sync_api import Page
# Wrap Playwright page for AgentQL capabilities
page = agentql.wrap(browser.new_page())
page.goto(URL)
# Define query
SEARCH_BOX_QUERY = """
{
search_product_box
}
"""
# Locate element for interaction
response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type("fish", delay=200)
# Extract data
PRODUCT_DATA_QUERY = """
{
price_currency
products[] {
name
price(integer)
}
}
"""
data = page.query_data(PRODUCT_DATA_QUERY)
Source: examples/python/first_steps/main.py:1-60
JavaScript SDK Pattern
import agentql from 'agentql-api';
const page = await browser.newPage();
const wrappedPage = agentql.wrap(page);
await wrappedPage.goto(URL);
// Use same query syntax
const response = await wrappedPage.queryData(`
{
price_currency
products[] {
name
price
}
}
`);
Source: examples/js/first-steps/README.md:1-20
Common Use Cases
Collecting Paginated Data
For paginated content, queries can be combined with navigation logic to collect data across multiple pages:
# Extract data from current page
data = page.query_data(PRODUCT_DATA_QUERY)
all_data.extend(data)
# Navigate to next page
next_button = page.query_elements("{ next_page_button }")
next_button.click()
Source: examples/python/collect_paginated_news_headlines/README.md:1-20
Form Interaction
Queries locate form fields and buttons for automated interaction:
{
username_field
password_field
submit_button
}
Source: examples/js/submit-form/README.md:1-20
Web Scraping with Structured Output
Queries define the exact shape of extracted data:
QUERY = """
{
price_currency
products[] {
name
price(integer)
}
}
"""
data = page.query_data(QUERY)
# Returns: { "price_currency": "USD", "products": [{ "name": "Item", "price": 29 }] }
Source: examples/python/collect_ecommerce_pricing_data/README.md:1-20
Architecture
graph TD
A[Developer writes<br/>AgentQL Query] --> B[AgentQL SDK sends<br/>query to API]
B --> C[LLM interprets<br/>query semantically]
C --> D[AgentQL returns<br/>element locators<br/>or extracted data]
D --> E[SDK provides<br/>typed response]
E --> F[query_elements<br/>returns Locators]
E --> G[query_data<br/>returns structured data]
F --> H[Playwright<br/>interacts with DOM]
G --> I[Structured dict<br/>for downstream use]
style A fill:#e1f5fe
style H fill:#fff3e0
style I fill:#e8f5e9Key Features Summary
| Feature | Description |
|---|---|
| Natural language selectors | Describe elements by purpose, not CSS/XPath |
| Structured output | Define exact data shape in queries |
| Inline transforms | Convert data types during extraction |
| Array support | Query lists with [] syntax |
| Cross-site compatibility | Same queries work across similar sites |
| Self-healing | Queries adapt when UI changes |
Source: README.md:8-15
Integration Points
Playwright Integration
AgentQL wraps Playwright page objects to provide query capabilities while preserving full Playwright API access:
page = agentql.wrap(browser.new_page())
# Use both AgentQL and Playwright methods
response = page.query_elements(QUERY)
response.some_element.click() # Playwright API
page.keyboard.press("Enter") # Playwright API
Source: examples/python/first_steps/main.py:41-48
SDK Availability
| SDK | Installation Guide | Use Case |
|---|---|---|
| Python SDK | docs.agentql.com | Automation, scraping |
| JavaScript SDK | docs.agentql.com | Node.js automation |
Source: README.md:20-30
Best Practices
- Use descriptive field names — Match query field names to content purpose rather than HTML attributes
- Apply transforms early — Convert data types in queries rather than post-processing
- Test with debugger extension — Use the AgentQL Debugger Chrome Extension to refine queries interactively
- Leverage natural language prompts — For complex element location,
get_by_prompt()often provides better resilience than structured queries
Source: examples/python/list_query_usage/README.md:1-20
Debugging Queries
Install the AgentQL Debugger Chrome Extension to:
- Test queries in real-time on live sites
- View element matches and confidence scores
- Export optimized queries to Python or JavaScript
Source: examples/python/first_steps/main.py:1-10
Related Documentation
- AgentQL Query Language Docs
- Python SDK Installation
- JavaScript SDK Installation
- REST API Reference
- Playground for interactive query testing
Source: https://github.com/tinyfish-io/agentql / Human Manual
Query Examples and Patterns
Related topics: AgentQL Query Language, Data Collection Patterns
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AgentQL Query Language, Data Collection Patterns
Query Examples and Patterns
AgentQL provides a powerful query language that enables AI agents and LLMs to interact with web pages in a natural, resilient way. This page covers practical examples and common patterns for writing effective queries to extract data and locate elements on web pages.
Overview
AgentQL queries are structured JSON-like expressions that define what data to extract or what elements to locate on a webpage. The query language supports:
- Natural language selectors that find elements based on semantic meaning
- Structured data extraction with typed transformations
- List/array queries for extracting multiple items
- Cross-site compatibility for reuse across similar websites
Source: README.md
Core Query Methods
AgentQL provides two primary API methods for interacting with web pages after wrapping a Playwright page object:
| Method | Purpose | Returns |
|---|---|---|
query_data() | Extract structured data from the page | Dictionary with extracted fields |
query_elements() | Locate DOM elements for interaction | Element references for actions |
get_by_prompt() | Find elements using natural language prompts | Element reference |
Source: examples/python/first_steps/main.py:54-77
Python SDK Usage
import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright
def main():
with sync_playwright() as playwright:
page = agentql.wrap(browser.new_page())
page.goto(URL)
# Extract data
data = page.query_data(PRODUCT_DATA_QUERY)
# Locate elements for interaction
response = page.query_elements(SEARCH_BOX_QUERY)
Source: examples/python/first_steps/main.py:1-45
JavaScript SDK Usage
const { wrap } = require('agentql');
const { chromium } = require('playwright');
async function main() {
const browser = await chromium.launch();
const page = await wrap(await browser.newPage());
await page.goto(URL);
const data = await page.queryData(query);
}
Source: examples/js/news-aggregator/main.js:1-20
List Queries
List queries allow extraction of multiple items from a page, such as product listings, news headlines, or any repeating content.
Basic List Query Pattern
Use the [] syntax to query arrays of items:
{
products[]
{
name
price(integer)
}
}
Source: examples/python/list_query_usage/main.py:15-21
Python List Query Example
QUERY = """
{
products[]
{
name
price(integer)
}
}
"""
def main():
with sync_playwright() as playwright:
page = agentql.wrap(browser.new_page())
page.goto(URL)
response = page.query_data(QUERY)
# Iterate over extracted products
for product in response["products"]:
file.write(f"{product['name']},{product['price']}\n")
Source: examples/python/list_query_usage/main.py:1-40
JavaScript List Query Example
const query = `
{
items(might be articles, posts, tweets)[]
{
published_date(convert to XX/XX/XXXX format)
entry(title or post if no title is available)
author(person's name; return "n/a" if not available)
outlet(the original platform it is posted on)
url
}
}
`;
const data = await page.queryData(query);
Source: examples/js/news-aggregator/main.js:10-19
Data Transformations
AgentQL supports inline transformations within queries to convert data types or format values.
Type Conversions
Use (type) syntax to convert extracted values:
| Transformation | Example | Description |
|---|---|---|
(integer) | price(integer) | Convert string to integer |
(float) | rating(float) | Convert to decimal number |
(string) | date(string) | Ensure string output |
Source: examples/python/first_steps/main.py:34-35
Format Instructions
Include format hints directly in the query:
{
published_date(convert to XX/XX/XXXX format)
entry(title or post if no title is available)
}
Source: examples/js/news-aggregator/main.js:12-13
Natural Language Element Location
The get_by_prompt() method uses natural language to find elements, making queries resilient to UI changes.
Finding Elements with Prompts
NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
def _add_qwilfish_to_cart(page: Page):
"""Add Qwilfish to cart with AgentQL Smart Locator API."""
# Find DOM element using natural language prompt
qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)
# Interact with the element using Playwright API
qwilfish_page_btn.click()
Source: examples/python/first_steps/main.py:79-88
Handling Dynamic Content
Infinite Scroll Patterns
Pages that load content based on scroll position require simulating scroll events:
def key_press_end_scroll(page):
"""Scroll to the end of the page by pressing End key."""
page.keyboard.press("End")
def mouse_wheel_scroll(page):
"""Alternative scroll using mouse wheel for different page behaviors."""
page.mouse.wheel(0, 3000)
Source: examples/python/infinite_scroll/README.md
Note: Scrolling to the end of a page by pressing theEndkey is not always reliable. Some pages have multiple scrollable areas, or theEndkey may be mapped to different functions. Test bothkey_press_end_scroll()andmouse_wheel_scroll()to find what works for your target site.
Paginated Data Collection
For pages with explicit pagination, iterate through pages while collecting data:
async def collect_paginated_data(page, pages_to_collect):
"""Collect data from multiple paginated pages."""
all_data = []
for page_num in range(pages_to_collect):
data = await page.query_data(QUERY)
all_data.extend(data["items"])
# Navigate to next page
await page.click("[aria-label='Next']")
await page.wait_for_load_state("networkidle")
return all_data
Source: examples/python/collect_paginated_news_headlines/README.md
Concurrent Data Collection
Fetch data from multiple URLs concurrently within the same browser session:
async def main():
WEBSITE_URLS = [
"https://bsky.app/search?q=agents+for+the+web",
"https://dev.to/search?q=agents%20for%20the+web",
"https://hn.algolia.com/?q=agents%20for%20the+web",
]
async with async_playwright() as p:
async with await p.chromium.launch(headless=True) as browser:
async with await browser.new_context() as context:
await asyncio.gather(
*(fetch_data(context, url) for url in WEBSITE_URLS)
)
Source: examples/python/news-aggregator/main.py:1-30
Data Export Patterns
CSV Export
import os
SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
CSV_FILE_PATH = os.path.join(SCRIPT_DIR, "news_headlines.csv")
def export_to_csv(data):
with open(CSV_FILE_PATH, "w", encoding="utf-8") as file:
file.write("Name, Price\n")
for product in data["products"]:
file.write(f"{product['name']},{product['price']}\n")
Source: examples/python/list_query_usage/main.py:24-33
Cleaning Data for Export
When exporting to CSV, clean special characters to avoid formatting issues:
for item in data["items"]:
# Strip '|' from entry to avoid CSV formatting issues
clean_entry = item["entry"].replace("|", "")
new_lines.append(
f"{item['published_date']} | {clean_entry} | {item['url']}\n"
)
Source: examples/python/news-aggregator/main.py:45-50
Query Workflow Diagram
graph TD
A[Initialize Browser with Playwright] --> B[Wrap Page with AgentQL]
B --> C[Navigate to Target URL]
C --> D{Select Query Method}
D -->|Extract Data| E[Use query_data with QUERY]
D -->|Locate Elements| F[Use query_elements or get_by_prompt]
E --> G[Process Results]
F --> H[Interact with Elements via Playwright]
H --> G
G --> I{More Pages?}
I -->|Yes| C
I -->|No| J[Export/Return Results]Common Query Patterns Summary
| Pattern | Use Case | Example Query |
|---|---|---|
| List extraction | Products, articles, items | products[] { name, price } |
| Type conversion | Numeric data | price(integer) |
| Format hints | Date formatting | date(convert to MM/DD/YYYY) |
| Flexible matching | Ambiguous content | items(might be articles)[] |
| Natural language | Element location | get_by_prompt("Submit button") |
Working with the AgentQL Debugger
The AgentQL Debugger Chrome extension allows you to:
- Test queries interactively on any webpage
- Refine natural language selectors
- Verify element selection before writing scripts
Install the extension and use it to experiment with queries before integrating them into your scripts.
Best Practices
- Start with the Debugger - Test queries in the Chrome extension before coding
- Use type conversions - Specify
(integer)or(float)for numeric fields - Handle edge cases - Use format instructions like
return "n/a" if not available - Clean exported data - Remove special characters before CSV export
- Test pagination - Verify scroll and navigation methods work for your target site
- Use natural language sparingly - Reserve
get_by_prompt()for complex or dynamic selectors
Related Documentation
Source: https://github.com/tinyfish-io/agentql / Human Manual
Browser Modes and Configuration
Related topics: Integrations and Framework Connections
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Integrations and Framework Connections
Browser Modes and Configuration
AgentQL provides flexible browser configuration options through its integration with Playwright, enabling developers to customize browser behavior for various use cases including headless automation, stealth operations, human-like interaction patterns, and remote browser connections.
Overview
Browser modes in AgentQL determine how the underlying Playwright browser instance operates during data extraction and automation tasks. The configuration system supports multiple deployment scenarios ranging from fully automated server-side operations to interactive debugging sessions.
The core browser configuration is handled through the agentql.wrap() function for synchronous operations and agentql.wrap_async() for asynchronous workflows, which accept a Playwright page object and enable AgentQL's query capabilities on top of it.
Source: examples/python/news-aggregator/main_sync.py
Browser Launch Configuration
Standard Browser Launch
The most common approach involves launching a browser instance directly within the script using Playwright's launch API. This provides full control over browser settings and lifecycle management.
from playwright.sync_api import sync_playwright
import agentql
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context()
page = agentql.wrap(context.new_page())
# Perform operations
browser.close()
Source: examples/python/news-aggregator/main_sync.py
Asynchronous Browser Launch
For applications requiring concurrent operations, AgentQL supports asynchronous browser management through Python's asyncio:
import asyncio
from playwright.async_api import async_playwright
import agentql
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context()
page = await agentql.wrap_async(context.new_page())
await page.goto(url)
# Perform operations
await browser.close()
Source: examples/python/news-aggregator/main.py
Headless Mode
Headless mode runs the browser without a visible UI window, making it ideal for server-side automation, continuous integration pipelines, and resource-constrained environments. AgentQL examples consistently demonstrate headless configuration for production deployments.
Configuration Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| headless | boolean | true | Controls UI visibility |
| args | list | [] | Chromium command-line arguments |
| downloads_path | string | None | Directory for download operations |
Source: examples/python/collect_paginated_news_headlines/README.md
Headless Browser Workflow
graph TD
A[Initialize Playwright] --> B[Launch Chromium with headless=True]
B --> C[Create Browser Context]
C --> D[Wrap Page with AgentQL]
D --> E[Execute Query Operations]
E --> F[Close Browser]Source: examples/python/run_script_in_headless_browser/main.py
Stealth Mode
Stealth mode configures the browser to minimize detection by anti-bot systems. This involves modifying browser attributes and behaviors that automated browsers typically expose.
Implementation Example
The stealth mode example demonstrates configuration to avoid common automation detection vectors:
from playwright.sync_api import sync_playwright
import agentql
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=[
'--disable-blink-features=AutomationControlled',
'--exclude-switches', 'enable-automation'
]
)
context = browser.new_context()
# Additional stealth configurations
page = agentql.wrap(context.new_page())
Source: examples/python/stealth_mode/main.py
Stealth Configuration Options
| Configuration | Purpose | Implementation |
|---|---|---|
| AutomationControlled flag | Hide webdriver presence | Chromium launch arguments |
| User agent spoofing | Match real browser signatures | Browser context settings |
| Navigator properties | Normalize exposed JavaScript values | Page.evaluate() modifications |
Source: examples/python/stealth_mode/main.py
Humanlike Mode and Anti-Bot Evasion
Humanlike mode simulates genuine user behavior to evade anti-bot detection systems. This includes randomizing interaction timing, mimicking scroll patterns, and implementing natural mouse movements.
Python Implementation
import random
import time
from playwright.sync_api import sync_playwright
import agentql
def humanlike_scroll(page):
"""Simulate natural scrolling behavior"""
scroll_amount = random.randint(300, 800)
page.evaluate(f'window.scrollBy(0, {scroll_amount})')
time.sleep(random.uniform(0.5, 2.0))
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = agentql.wrap(browser.new_page())
# Apply humanlike interaction patterns
page.goto(target_url)
for _ in range(random.randint(2, 5)):
humanlike_scroll(page)
Source: examples/python/humanlike-antibot/main.py
JavaScript Implementation
const { wrap, configure } = require('agentql');
const { chromium } = require('playwright');
async function humanlikeDelay() {
const delay = Math.floor(Math.random() * 2000) + 500;
return new Promise(resolve => setTimeout(resolve, delay));
}
async function main() {
const browser = await chromium.launch({ headless: false });
const page = await wrap(await browser.newPage());
await page.goto(url);
await humanlikeDelay();
}
Source: examples/js/humanlike-antibot/main.js
Humanlike Interaction Patterns
| Pattern | Description | Anti-Bot Impact |
|---|---|---|
| Random delays | Variable wait times between actions | Prevents uniform timing detection |
| Variable scroll | Randomized scroll distances and speeds | Mimics human browsing behavior |
| Mouse movements | Non-linear cursor paths | Evades motion tracking systems |
| Typing simulation | Randomized keystroke intervals | Avoids robotic typing detection |
Source: examples/python/humanlike-antibot/main.py
Remote Browser Connection
AgentQL supports connecting to existing browser instances running remotely, which is essential for Cloudflare Browser Rendering integration and distributed scraping architectures.
Connection Workflow
graph LR
A[Start Remote Browser<br/>with debugging port] --> B[Connect via<br/>WebSocket URL]
B --> C[Create AgentQL Page]
C --> D[Execute Queries]
D --> E[Retrieve Results]Source: examples/js/use-existing-browser/README.md
WebSocket Connection Format
Remote browser connections use the WebSocket debugging protocol:
ws://127.0.0.1:9222/devtools/browser/{browser-id}
Source: examples/python/use_existing_browser/README.md
Python Remote Browser Usage
import agentql
from playwright.sync_api import sync_playwright
# Connect to existing browser via DevTools URL
REMOTE_BROWSER_URL = "ws://127.0.0.1:9222/devtools/browser/387adf4c-243f-4051-a181-46798f4a46f4"
with sync_playwright() as p:
# Connect to the remote browser instead of launching
browser = p.chromium.connect_over_cdp(REMOTE_BROWSER_URL)
context = browser.new_context()
page = agentql.wrap(context.new_page())
# Navigate to pages within the connected browser
page.goto("https://scrapeme.live/shop/Charmander/")
data = page.query_data(QUERY)
Source: examples/python/use_remote_browser/main.py
JavaScript Remote Browser Usage
const { wrap, configure } = require('agentql');
const { chromium } = require('playwright');
const REMOTE_BROWSER_URL = 'ws://127.0.0.1:9222/devtools/browser/387adf4c-243f-4051-a181-46798f4a46f4';
async function main() {
// Connect to existing browser instance
const browser = await chromium.connectOverCDP(REMOTE_BROWSER_URL);
const page = await wrap(await browser.newPage());
await page.goto('https://scrapeme.live/shop/Charmander/');
const data = await page.queryData(QUERY);
}
Source: examples/js/use-existing-browser/README.md
Browser Context Configuration
Browser contexts provide isolation between browsing sessions, enabling parallel operations and independent cookie/storage management.
Context Options
| Option | Type | Description |
|---|---|---|
| viewport | dict | Browser window dimensions |
| user_agent | string | Custom user agent string |
| locale | string | Browser locale setting |
| timezone_id | string | Simulated timezone |
| permissions | list | Granted permissions |
| ignore_https_errors | boolean | SSL certificate handling |
Source: examples/js/package.json
Multiple Context Example
from playwright.sync_api import sync_playwright
import agentql
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
# Create multiple independent contexts
context1 = browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
)
context2 = browser.new_context(
viewport={'width': 1366, 'height': 768},
locale='en-GB'
)
page1 = agentql.wrap(context1.new_page())
page2 = agentql.wrap(context2.new_page())
Source: examples/python/news-aggregator/main_sync.py
API Key Configuration
AgentQL requires API key configuration for cloud-based query execution. The configuration can be set explicitly or rely on environment variables.
Python Configuration
from agentql import configure
# Set API key explicitly
configure(api_key="your-agentql-api-key")
JavaScript Configuration
const { wrap, configure } = require('agentql');
// Configure API key
configure({
apiKey: process.env.AGENTQL_API_KEY
});
Source: examples/js/get-by-prompt/main.js
Page Navigation and Waiting
Proper page load handling is crucial for reliable data extraction across different website architectures.
Wait Strategies
| Strategy | Use Case | Implementation |
|---|---|---|
| networkidle | SPA with dynamic content | page.wait_for_load_state('networkidle') |
| domcontentloaded | Simple pages | page.goto(url) default |
| commit | Fast redirects | Immediate navigation |
| timeout | Slow connections | page.goto(url, timeout=30000) |
Source: examples/python/collect_paginated_news_headlines/README.md
Navigation with AgentQL
import agentql
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = agentql.wrap(browser.new_page())
# Standard navigation
page.goto("https://example.com")
# Wait for dynamic content
page.wait_for_load_state('networkidle')
# Execute query after page is ready
data = page.query_data(QUERY)
Source: examples/js/collect-paginated-news-headlines/README.md
Best Practices
Mode Selection Guidelines
- Headless Mode: Use for production deployments, CI/CD pipelines, and server-side automation where no user interaction is needed
- Stealth Mode: Apply when targeting sites with anti-bot measures that check for automation indicators
- Humanlike Mode: Reserve for high-security targets requiring behavioral analysis evasion
- Remote Browser: Employ when debugging, testing across specific browser versions, or integrating with cloud browser services
Security Considerations
Community issue #128 discusses the challenges of using AgentQL with Cloudflare's Browser Rendering in edge environments. Some Node.js APIs behave differently in edge contexts, requiring adaptation of browser configuration code.
Source: github.com/tinyfish-io/agentql/issues/128
Performance Optimization
| Technique | Impact | Implementation |
|---|---|---|
| Context reuse | Reduces memory overhead | Reuse contexts for related pages |
| Async operations | Improves throughput | Use wrap_async() for concurrent tasks |
| Headless mode | Reduces resource usage | Default to headless=True |
| Selective waits | Faster execution | Use specific wait conditions over timeouts |
Source: https://github.com/tinyfish-io/agentql / Human Manual
Data Collection Patterns
Related topics: Query Examples and Patterns, Integrations and Framework Connections
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Query Examples and Patterns, Integrations and Framework Connections
Data Collection Patterns
AgentQL provides robust patterns for collecting structured data from websites. These patterns leverage the query language's natural language selectors and structured output capabilities to extract data reliably across different page layouts and UI changes.
Overview
Data collection in AgentQL revolves around extracting structured information from web pages using queries that define the expected data shape. The patterns demonstrated in the examples cover common scenarios including paginated data collection, multi-URL aggregation, and list extraction with transformations.
Pagination Patterns
Pagination patterns enable collecting data that spans multiple pages, a common requirement for e-commerce listings, news archives, and search results.
Python Implementation
The paginated data collection pattern uses a loop structure that:
- Navigates to the initial page
- Extracts data using
query_data()with a structured query - Detects pagination elements to proceed to the next page
- Continues until no more pages exist or a limit is reached
# Source: examples/python/collect_paginated_ecommerce_listing_data/main.py
from playwright.sync_api import sync_playwright
import agentql
URL = "https://scrapeme.live/shop"
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await agentql.wrap_async(browser.new_page())
all_products = []
current_page = 1
max_pages = 5
while current_page <= max_pages:
await page.goto(f"{URL}/page/{current_page}/")
# Query structured data from the page
data = await page.query_data(PRODUCT_DATA_QUERY)
all_products.extend(data.get("products", []))
current_page += 1
JavaScript Implementation
// Source: examples/js/collect-paginated-ecommerce-data/main.js
const { chromium } = require('playwright');
const agentql = require('agentql');
(async () => {
const browser = await chromium.launch({ headless: true });
const page = await agentql.wrap(browser.newPage());
let pageNum = 1;
const allProducts = [];
while (pageNum <= maxPages) {
await page.goto(`${baseUrl}?page=${pageNum}`);
const data = await page.queryData(PRODUCT_QUERY);
allProducts.push(...data.products);
pageNum++;
}
})();
Pagination Query Structure
| Element | Query Field | Purpose |
|---|---|---|
| Product cards | products[] | Array of product items on each page |
| Pagination control | next_page_button | Element to click for next page |
| Item counter | total_items | Total count displayed on page |
| Page indicator | current_page | Current page number |
Multi-URL Aggregation Patterns
Collecting data from multiple URLs simultaneously improves efficiency when you need to aggregate information from disparate sources.
Concurrent Tab Collection
The news aggregator example demonstrates opening multiple URLs in separate tabs within the same browser context:
# Source: examples/python/news-aggregator/main.py
WEBSITE_URLS = [
"https://duckduckgo.com/?q=agents+for+the+web&t=h_&iar=news&ia=news",
# Additional URLs...
]
async def main():
async with async_playwright() as p, await p.chromium.launch(
headless=True
) as browser, await browser.new_context() as context:
# Open multiple tabs concurrently to fetch data
await asyncio.gather(
*(fetch_data(context, url) for url in WEBSITE_URLS)
)
Data Flow Architecture
graph TD
A[Start Browser Context] --> B[Create Multiple Tabs]
B --> C[Concurrent URL Navigation]
C --> D[Query Data per Page]
D --> E[Transform & Clean Data]
E --> F[Write to CSV/JSON]
F --> G[Close Browser]Handling Multi-Source Data
Each source may return data in different structures. The aggregator normalizes this using AgentQL queries that return consistent field names:
# Source: examples/python/news-aggregator/main.py
QUERY = """
{
items[] {
entry
published_date
url
outlet
author
}
}
"""
List Extraction Patterns
Extracting lists of items requires defining array fields in the AgentQL query syntax using [] notation.
Basic List Query
# Source: examples/python/first_steps/main.py
PRODUCT_DATA_QUERY = """
{
price_currency
products[] {
name
price(integer)
}
}
"""
The products[] notation defines an array of items, where each item contains name and price fields. The (integer) modifier transforms the price string to a numeric type.
Data Transformation During Extraction
AgentQL supports inline transformations within queries:
| Transform | Syntax | Example |
|---|---|---|
| Type conversion | (type) | price(integer), date(date) |
| String cleaning | .strip() | title.strip() |
| Array filtering | [condition] | items[count > 0] |
Maps and Location Data Collection
The maps scraper examples demonstrate collecting geographic and location-based data:
Python Maps Scraper
# Source: examples/python/maps_scraper/main.py
LOCATION_QUERY = """
{
business_name
rating
reviews_count
address
phone
website
category
}
"""
JavaScript Maps Scraper
// Source: examples/js/maps_scraper/main.js
const LOCATION_QUERY = `
{
business_name
rating
reviews_count
address
phone
website
category
}
`;
Both implementations follow the same pattern:
- Navigate to the map service URL with search parameters
- Wait for results to load
- Execute the query to extract structured location data
- Store results in the desired format
Data Export Patterns
AgentQL examples demonstrate multiple export formats for collected data.
CSV Export
# Source: examples/python/news-aggregator/main.py
CSV_FILE_PATH = os.path.join(SCRIPT_DIR, "news_headlines.csv")
async def fetch_data(context: BrowserContext, session_url):
page = await agentql.wrap_async(context.new_page())
await page.goto(session_url)
data = await page.query_data(QUERY)
# Prepare new data with pipe-separated format
new_lines = []
for item in data["items"]:
# Strip '|' from entry to avoid CSV formatting issues
clean_entry = item["entry"].replace("|", "")
new_lines.append(
f"{item['published_date']} | {clean_entry} | {item['url']} | {item['outlet']} | {item['author']}\n"
)
Data Cleaning During Export
| Issue | Solution | Example | |
|---|---|---|---|
| CSV delimiter collision | Strip delimiter characters | `item["entry"].replace(" | ", "")` |
| Type inconsistency | Apply transforms in query | price(integer) | |
| Missing fields | Provide defaults | field or "N/A" | |
| Whitespace | Trim strings | field.strip() |
Error Handling Patterns
Resilient data collection requires proper error handling for network issues, page load failures, and query mismatches.
Try-Except Block Pattern
# Source: examples/python/collect_paginated_news_headlines/main.py
async def collect_headlines(page, query, max_pages=10):
all_headlines = []
for page_num in range(1, max_pages + 1):
try:
await page.goto(f"{BASE_URL}&page={page_num}")
await page.wait_for_load_state("networkidle")
data = await page.query_data(query)
headlines = data.get("headlines", [])
if not headlines:
break # No more data available
all_headlines.extend(headlines)
except Exception as e:
print(f"Error on page {page_num}: {e}")
continue
return all_headlines
Resilience to UI Changes
AgentQL's natural language selectors provide resilience to UI changes. When page structure changes, queries using semantic descriptions continue to work, unlike CSS selectors that break when DOM structure changes.
Best Practices
Query Design
- Use semantic field names: Match query field names to visible content, not DOM attributes
- Define array fields explicitly: Use
[]notation for lists of similar items - Apply transforms early: Use type conversions in queries rather than post-processing
- Handle missing data: Design queries with optional fields using the
?modifier
Performance Optimization
| Technique | Implementation |
|---|---|
| Concurrent tab collection | Use asyncio.gather() for multiple URLs |
| Headless browsing | Set headless=True for server environments |
| Context reuse | Reuse browser contexts to maintain session state |
| Pagination limits | Set maximum page counts to prevent infinite loops |
Cross-Site Compatibility
The same AgentQL query can work across sites with similar content structure. For example, a product listing query designed for one e-commerce site may work on another with minimal modification due to the natural language selector approach.
Related Documentation
Source: https://github.com/tinyfish-io/agentql / Human Manual
Integrations and Framework Connections
Related topics: REST API, Browser Modes and Configuration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: REST API, Browser Modes and Configuration
Integrations and Framework Connections
AgentQL provides flexible integration options with various frameworks, automation tools, and deployment environments. This page covers the available SDKs, framework connections, authentication patterns, and deployment considerations.
Overview
AgentQL connects LLMs and AI agents to the web through its query language and Playwright integrations. The platform offers multiple integration pathways:
| Integration Type | Description |
|---|---|
| Python SDK | Running automation and scraping scripts with AgentQL queries in Python |
| JavaScript SDK | Running automation and scraping scripts with AgentQL queries in JavaScript |
| REST API | Executing queries without an SDK |
| MCP Server | Model Context Protocol integration for AI agents |
| Framework Integrations | Langchain, Zapier, and other automation tools |
SDK Integration Architecture
AgentQL provides seamless integration with Playwright, the browser automation library. Both Python and JavaScript SDKs wrap Playwright's browser context to enable AgentQL querying capabilities.
Python SDK Integration
The Python SDK integrates with Playwright's sync and async APIs. The core integration pattern uses the agentql.wrap() function to extend Playwright page objects with AgentQL querying capabilities.
import agentql
from agentql.ext.playwright.sync_api import Page
from playwright.sync_api import sync_playwright
def main():
with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
page = agentql.wrap(browser.new_page())
page.goto(URL)
Source: examples/python/first_steps/main.py:1-19
JavaScript SDK Integration
The JavaScript SDK follows a similar pattern, wrapping Playwright page objects to provide AgentQL querying methods.
const { chromium } = require('playwright');
const agentql = require('agentql');
(async () => {
const browser = await chromium.launch();
const page = await agentql.wrap(browser.newPage());
await page.goto('https://example.com');
})();
Source: examples/js/log-into-sites/main.js:1-50
SDK Dependencies
| SDK | Key Dependencies |
|---|---|
| Python SDK | playwright, agentql |
| JavaScript SDK | playwright, playwright-dompath, openai, agentql |
Source: examples/js/package.json:1-30
Authentication and Session Management
AgentQL supports authenticated web interactions through session persistence and browser context management.
Login Pattern
Authentication is achieved by performing login actions before executing AgentQL queries. The pattern involves navigating to the login page, performing credentials entry, and then executing queries within the authenticated session.
async def log_in(page):
await page.goto(LOGIN_URL)
await page.fill(USERNAME_SELECTOR, USERNAME)
await page.fill(PASSWORD_SELECTOR, PASSWORD)
await page.click(LOGIN_BUTTON)
await page.wait_for_load_state("networkidle")
Source: examples/python/log_into_sites/main.py:1-60
Session Persistence
Authenticated sessions can be saved and restored using Playwright's storage state mechanism. This allows maintaining login state across script executions.
async def save_authenticated_session(context, storage_path):
await context.storage_state(path=storage_path)
async def load_authenticated_session(browser, storage_path):
context = await browser.new_context(storage_state=storage_path)
return context
Source: examples/python/save_and_load_authenticated_session/main.py:1-80
Session Flow
graph TD
A[Launch Browser] --> B{Check for Existing Session}
B -->|Session Exists| C[Load Storage State]
B -->|No Session| D[Create New Context]
C --> E[Navigate to Target URL]
D --> F[Login to Site]
F --> E
E --> G[Execute AgentQL Queries]
G --> H[Optional: Save Session]Framework Integrations
LangChain Integration
AgentQL integrates with LangChain for building agent workflows that interact with web pages. The integration allows LangChain agents to use natural language queries that translate to AgentQL queries.
Community Note: The LangChain integration enables AI agents to browse and extract data from websites using natural language instructions.
Zapier Integration
AgentQL provides Zapier integration for no-code automation workflows, enabling users to incorporate web data extraction into automated processes without writing code.
MCP Server
The Model Context Protocol (MCP) server integration allows AI agents to interact with web pages through a standardized protocol. This enables:
- Remote browser control
- Query execution via API
- Integration with AI agent frameworks
External AI Service Integration
AgentQL can be combined with external AI services for advanced data processing, such as sentiment analysis on extracted content.
from openai import OpenAI
def perform_sentiment_analysis(comments):
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": SYSTEM_MESSAGE},
{"role": "user", "content": USER_MESSAGE},
],
)
return completion.choices[0].message.content
Source: examples/python/perform_sentiment_analysis/main.py:1-50
Data Processing Pipeline
graph LR
A[Web Page] -->|AgentQL Query| B[Extract Data]
B --> C[Process with LLM]
C -->|Sentiment| D[Analysis Results]
C -->|Summary| E[Content Summary]Cloudflare Browser Rendering Integration
Community Note: Issue #128 discusses using AgentQL with Cloudflare's Browser Rendering feature, which provides browser instances from Cloudflare Workers via Playwright.
The integration with Cloudflare Browser Rendering enables:
- Edge-based browser automation
- Scalable browser infrastructure
- Serverless web scraping workflows
Edge Environment Considerations
When deploying AgentQL in edge environments like Cloudflare Workers:
- Node.js APIs may have limitations
- CDP (Chrome DevTools Protocol) connection handling differs from standard Node.js
- Browser instance lifecycle management requires careful handling
Source: Issue #128: AgentQL JS x Cloudflare Browser Rendering
REST API Integration
For environments where SDK installation is not feasible, AgentQL provides a REST API for executing queries without an SDK.
| Endpoint Type | Use Case |
|---|---|
| Query Execution | Execute AgentQL queries via HTTP |
| Data Extraction | Retrieve structured data from web pages |
Source: REST API Documentation
Integration Patterns
Concurrent Data Collection
AgentQL supports concurrent page interactions using async patterns:
async def main():
async with async_playwright() as p, await p.chromium.launch(headless=True) as browser, await browser.new_context() as context:
await asyncio.gather(
*(fetch_data(context, url) for url in WEBSITE_URLS)
)
Source: examples/python/news-aggregator/main.py:1-40
Pagination Handling
Integration with pagination enables data collection across multiple pages:
async def collect_paginated_data(page, query):
all_items = []
while True:
data = await page.query_data(query)
all_items.extend(data["items"])
if not await page.locator("next_button").is_visible():
break
await page.click("next_button")
return all_items
Multi-Tab Browser Context
For concurrent operations, AgentQL supports multiple tabs within a single browser context:
async def fetch_data(context, url):
page = await agentql.wrap_async(context.new_page())
await page.goto(url)
data = await page.query_data(QUERY)
return data
Configuration Options
Browser Launch Options
| Option | Type | Description |
|---|---|---|
| headless | boolean | Run browser without visible UI |
| args | list | Additional browser arguments |
| viewport | dict | Browser viewport dimensions |
Query Options
| Option | Description |
|---|---|
| timeout | Maximum wait time for query results |
| retry_count | Number of retry attempts on failure |
| strict_mode | Enable strict element matching |
Best Practices
Error Handling
- Implement retry logic for network failures
- Handle authentication session expiration gracefully
- Use appropriate timeouts for slow-loading pages
Resource Management
- Close browser contexts when operations complete
- Use headless mode for production deployments
- Reuse browser instances for multiple queries when possible
Security Considerations
- Store credentials securely (environment variables, secrets management)
- Implement session timeout policies
- Validate SSL certificates for production use
Related Documentation
Source: https://github.com/tinyfish-io/agentql / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 8 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_761b694cc0e94100b46ba5683041137b | https://github.com/tinyfish-io/agentql/issues/114
2. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_55a8aa1466634fb39e0b679f753270ec | https://github.com/tinyfish-io/agentql/issues/148
3. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | github_repo:760722197 | https://github.com/tinyfish-io/agentql
4. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:760722197 | https://github.com/tinyfish-io/agentql
5. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | github_repo:760722197 | https://github.com/tinyfish-io/agentql
6. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | github_repo:760722197 | https://github.com/tinyfish-io/agentql
7. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:760722197 | https://github.com/tinyfish-io/agentql
8. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:760722197 | https://github.com/tinyfish-io/agentql
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using agentql with real data or production workflows.
- Starlog published a deep-dive on tinyfish-io/agentql - github / github_issue
- Dependency Dashboard - github / github_issue
- Capability evidence risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence