# trafilatura

Canonical URL: https://doramagic.ai/en/projects/trafilatura/

Source repository: https://github.com/adbar/trafilatura

## What it is

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

## Capability boundary

skill, recipe, host_instruction, eval, preflight

## First safe verification

Verify the smallest path in an isolated environment and keep a rollback path.

## Main risk

May increase setup, validation, or first-run risk for the user.

## Evidence base

https://github.com/adbar/trafilatura, https://github.com/adbar/trafilatura#readme, Human Manual, Pitfall Log
