# unstructured - Doramagic AI Context Pack

> Purpose: pre-work context for the user's host AI. This pack does not prove that the project has been installed, run, or validated.

## Project

- canonical_name: `Unstructured-IO/unstructured`
- capability: Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
- expected_user_outcome: Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

## Operating Boundaries

- Do not claim that the project has been installed, run, called through an API, or used on local files unless separate evidence proves it.
- Project facts must come from repo evidence, Claim Graph, or explicit source references.
- When a capability is not verified, mark it as unverified instead of completing it as fact.
- publish_status: `publishable`
- blocking_gaps: none

---

## Doramagic Context Augmentation

The following sections strengthen the repository context for a host AI. Human Manual data is a reading route, and pitfall notes become operating constraints.

## Human Manual Outline

Usage rule: this is only a reading route and salience signal, not factual authority. Concrete claims must still return to repo evidence or Claim Graph.

Host AI hard rules:
- Do not treat page titles, section order, summaries, or importance values as factual project evidence.
- When explaining the Human Manual outline, state that it is only a reading route or salience signal.
- Capability, installation, compatibility, runtime state, and risk claims must cite repo evidence, source paths, or Claim Graph.

- **Overview, Installation, and Quick Start**: importance `high`
  - source_paths: README.md, pyproject.toml, Dockerfile, unstructured/cli.py, unstructured/doctor.py
- **Document Partitioning Pipeline**: importance `high`
  - source_paths: unstructured/partition/auto.py, unstructured/partition/strategies.py, unstructured/partition/pdf.py, unstructured/partition/pdf_image/__init__.py, unstructured/partition/pdf_image/form_extraction.py
- **Elements, Chunking, and Output Formats**: importance `high`
  - source_paths: unstructured/documents/elements.py, unstructured/documents/ontology.py, unstructured/documents/coordinates.py, unstructured/documents/mappings.py, unstructured/chunking/base.py
- **Embeddings, Connectors, and Metrics**: importance `medium`
  - source_paths: unstructured/embed/openai.py, unstructured/embed/voyageai.py, unstructured/embed/vertexai.py, unstructured/embed/bedrock.py, unstructured/embed/huggingface.py

## Repo Inspection Evidence

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `5ead69ad146986a647ccbb4219ce94844710f4a9`
- inspected_files: `Dockerfile`, `README.md`, `pyproject.toml`, `uv.lock`

Host AI hard rules:
- Without repo_clone_verified=true, do not claim that the source code has been read.
- Without repo_inspection_verified=true, do not write README, docs, or package-file conclusions as facts.
- Without quick_start_verified=true, do not claim that the Quick Start path has run successfully.

## Doramagic Pitfall Constraints

These rules come from Doramagic discovery, validation, or compilation findings. The host AI must treat them as operating constraints, not background notes.

### Constraint 1: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Unstructured-IO/unstructured/issues/3871
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 2: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Unstructured-IO/unstructured/issues/4320
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 3: Security or permission risk requires verification

- Trigger: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: packet_text.keyword_scan | https://github.com/Unstructured-IO/unstructured
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 4: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: identity.distribution | https://github.com/Unstructured-IO/unstructured
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 5: Capability evidence risk requires verification

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/Unstructured-IO/unstructured
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 6: Runtime risk requires verification

- Trigger: Developers should check this runtime risk before relying on the project: Number getting converted into scientific notation in metadata.text_as_html
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Number getting converted into scientific notation in metadata.text_as_html. Context: Observed when using python
- Why it matters: Developers may hit a documented source-backed failure mode: Number getting converted into scientific notation in metadata.text_as_html
- Evidence: failure_mode_cluster:github_issue | https://github.com/Unstructured-IO/unstructured/issues/3871
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 7: Maintenance risk requires verification

- Trigger: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Unstructured-IO/unstructured
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 8: Security or permission risk requires verification

- Trigger: no_demo
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/Unstructured-IO/unstructured
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 9: Security or permission risk requires verification

- Trigger: no_demo
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/Unstructured-IO/unstructured
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 10: Runtime risk requires verification

- Trigger: Developers should check this performance risk before relying on the project: [Feature Request] Add document layout analysis confidence scores
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: [Feature Request] Add document layout analysis confidence scores. Context: Observed when using python
- Why it matters: Developers may hit a documented source-backed failure mode: [Feature Request] Add document layout analysis confidence scores
- Evidence: failure_mode_cluster:github_issue | https://github.com/Unstructured-IO/unstructured/issues/4320
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.
