Instructor 结构化输出

Instructor:声明 Pydantic BaseModel 即可从 20 个 LLM provider 拿到类型化实例。核心是 monkey-patch(instructor.patch / from_*)拦截 create(),注入 schema-aware kwargs,tenacity 重试 +

✓ 0 人报告成功·v0.1.0·

晶体简介

Instructor 是把 Pydantic BaseModel 直接绑到 LLM 输出的 Python 框架(github.com/jxnl/instructor)。核心机制:monkey-patch(instructor.patch / instructor.from_*)拦截 provider client 的 create() 调用,注入 schema-aware kwargs,在 tenacity 重试循环里跑,验证 JSON 响应到模型,ValidationError 时把 failed_attempts 作为 XML 重写 prompt 再试。 支持 20 个 provider × 36 个 Mode 枚举值 = 720 个 (provider, mode) 组合,走两个 dict 表分发。OpenAI 是默认 monkey-patch 目标(Mode.TOOLS 默认);Anthropic / Google(gemini / vertexai / genai)/ 9 个 SaaS provider 各有from_* 工厂。 本 skill 自带 47 条约束(含 4 条 fatal),覆盖典型踩坑:failed_attempts XML 每次重试线性增长(max_retries=5 可超 context window)、from_openai mode 验证用 assert(python -O 下静默剥离)、ollama / azure_openai / google / litellm 落到 Provider.UNKNOWN(assert 和 ModeError 都不触发)。

Blueprint Source

finance-bp-139

jxnl/instructor3f1d6dd1 source file

Constraints

4total
4fatal
4 must-not-violate

Evidence Quality

Confidence90%

High confidence — strong evidence base

4 条不可违反的约束

FATALdomain_ruleinstructor-C-001

WHENWhen deploying instructor with from_openai (or routes converging on it: OpenAI / OpenRouter / Anyscale / Together / Databricks) to production

ACTIONrun the Python interpreter with the -O optimization flag, because from_openai validates the (provider, mode) pair via Python assert statements that -O strips silently

CONSEQUENCEUnder python -O, the assert mode in {...} blocks in from_openai are removed; invalid (provider, mode) combinations reach the LLM call producing malformed kwargs, undefined provider responses, or silently wrong-shaped completions across 5 OpenAI-family base_urls

FATALdomain_ruleinstructor-C-003

WHENWhen pointing instructor at self-hosted OpenAI-compatible endpoints (vLLM / TGI / Ollama / LiteLLM proxy) or providers whose base_url is not in the 16-substring table (azure_openai / google / litellm / ollama)

ACTIONrely on instructor's automatic mode validation, because get_provider() will return Provider.UNKNOWN — neither the from_openai assert blocks nor the raise ModeError branches fire, leaving the (provider, mode) pair entirely unchecked

CONSEQUENCESelf-hosted endpoints fall to Provider.UNKNOWN; assert blocks dispatch on Provider enum values (OPENROUTER/ANYSCALE/TOGETHER/OPENAI/DATABRICKS) so all assertions silently pass, and provider-specific optimizations are skipped — debugging wrong-shaped responses requires reading the dispatch table source

FATALdomain_ruleinstructor-C-007

WHENWhen passing max_retries to instructor (especially via from_provider)

ACTIONtreat max_retries as a single semantic — it appears at three independent code points with different defaults: patch.py default=1 (reask only), Instructor.create default=3 (reask only), and auto_client.py:180-185 transparently passes it to openai.OpenAI(max_retries=...) which is the SDK's HTTP-level retry (network only) — a single max_retries=5 to from_provider can yield 5 reasks × 5 SDK HTTP retries = 25 worst-case API calls

CONSEQUENCEPassing one max_retries through from_provider transparently amplifies into both instructor reask and SDK HTTP retry layers, producing up to N×N API calls; on rate-limited or pay-per-call providers this drains the cost budget and triggers vendor throttling cascades within a single user request

常见问题

讨论 (0)

暂无讨论,成为第一个发言的人吧!

更新历史

v0.1.02026-04-25·贡献者: tangweigang-jpg

v0.1.0: 首次发布到 Doramagic.ai。基于 jxnl/instructor 的 Pydantic 结构化输出框架,中英双语 + 47 条 anti-pattern 约束(4 条 fatal)+ 3 条 FAQ。

v0.1.02026-04-25·贡献者: tangweigang-jpg

v0.1.0: 首次发布到 Doramagic.ai。基于 jxnl/instructor 的 Pydantic 结构化输出框架,中英双语 + 47 条 anti-pattern 约束(4 条 fatal)+ 3 条 FAQ。