# olmocr

Canonical URL: https://doramagic.ai/en/projects/olmocr/

Source repository: https://github.com/allenai/olmocr

## What it is

Toolkit for linearizing PDFs for LLM datasets/training

## Capability boundary

skill, recipe, host_instruction, eval, preflight

## First safe verification

Verify the smallest path in an isolated environment and keep a rollback path.

## Main risk

May increase setup, validation, or first-run risk for the user.

## Evidence base

https://github.com/allenai/olmocr, https://github.com/allenai/olmocr#readme, Human Manual, Pitfall Log
