这个 skill 适合什么用户？能做哪些任务？

适合做企业知识库、文档问答、RAG 应用的工程师：从 PDF / Markdown / 网页等异构文档构建索引，结合 retrieve-then-synthesize 给 LLM 提供外部知识。覆盖 FunctionAgent / ReActAgent / CodeActAgent 等 agent 范式。访问 doramagic.ai/r/llama-index 查看完整用例。

需要准备什么环境？依赖什么？

Python 3.9+，至少一个 LLM provider（默认隐式 OpenAI gpt-3.5-turbo）和一个 embedding provider（默认隐式 OpenAI text-embedding-ada-002 → 1536 维）。默认用内存 SimpleVectorStore，持久化需安装对应集成包。

会踩哪些坑？这个 skill 怎么防护？

本 skill 内置 52 条约束（5 条 fatal）。典型踩坑：(1) ServiceContext 已硬删除（不是 deprecated），3 个入口直接 raise ValueError；(2) SentenceSplitter chunk_overlap 默认 200（与文档常引用的 constants.DEFAULT_CHUNK_OVERLAP=20 不一致）；

LlamaIndex RAG 框架

LlamaIndex：把任意文档变 LLM 可查询知识的 Python 框架。4 大支柱（Index/Retriever/QueryEngine/Synthesizer）+ 52 条 anti-pattern 约束（5 fatal）。

AI 机器学习数据

✓ 0 人报告成功·v0.1.0·更新于 2026-04-25

晶体简介

LlamaIndex 是把任意文档变成 LLM 可查询知识的 Python 框架（github.com/run-llama/llama_index）。四大支柱（Index / Retriever / QueryEngine / ResponseSynthesizer）配置化检索-合成循环；Ingestion pipeline 处理 Document → Node → Embedding → Index 转换，带 content-hash 缓存；workflow / agent 子模块（FunctionAgent / ReActAgent / CodeActAgent / multi-agent）在外部 'workflows' 原语之上叠加 tool-calling。 Settings（单例）替换 v0.9 的 ServiceContext 作为全局配置面（ServiceContext 已硬删除，3 个入口直接 raise ValueError）。本 skill 自带 52 条约束（含 5 条 fatal），覆盖典型踩坑：ServiceContext 硬删除（不是 deprecated）、SentenceSplitter chunk_overlap 默认 200（不是constants.DEFAULT_CHUNK_OVERLAP=20）、embedding model 身份不持久化到 index_struct/storage_context、CJK / 多语言语料用默认 SentenceSplitter 踩 punkt 英语分词等。宿主 AI 自动应用这些约束。

Blueprint Source

finance-bp-135

run-llama/llama_index0a6c90b1 source file

Constraints

5total

5fatal

5 must-not-violate

Evidence Quality

Confidence90%

High confidence — strong evidence base

5 条不可违反的约束

FATALdomain_rule?

WHENWhen porting code from a llama-index v0.9 era tutorial / blog / Stack Overflow answer that constructs a ServiceContext object

ACTIONDelete every ServiceContext.from_defaults / ServiceContext(...) / set_global_service_context(...) call. Replace with attribute assignments on the module-level Settings singleton (e.g. Settings.llm = OpenAI(...), Settings.embed_model = OpenAIEmbedding(...), Settings.node_parser = SentenceSplitter(chunk_overlap=20)) BEFORE any index/query construction. Do not pass a ServiceContext kwarg to BaseIndex.from_documents.

CONSEQUENCEundefined behavior

domain-constraint

FATALdomain_rule?

WHENWhen designing a workflow where the index is persisted to storage today and re-loaded later (possibly by a different process / different developer) for query

ACTIONDo not rely on storage_context to remember which embedder built the index. Treat the embed model identity as caller-managed state — always reconstruct the index with the same explicit embed_model that was used at index time, or fail loudly when re-loading. Read llamaindex-C-004 for the remedy.

CONSEQUENCEundefined behavior

domain-constraint

FATALdomain_rule?

WHENWhen persisting an index to disk / vector store today for later re-load and query

ACTIONAt index time: write a sidecar file (e.g. {storage_dir}/embed_model.json) with {'provider_class': type(embed_model).__module__ + '.' + type(embed_model).__name__, 'model_name': getattr(embed_model, 'model_name', None), 'embed_dim': getattr(embed_model, 'embed_dim', None) or len(embed_model.get_text_embedding('probe'))}. At re-load: read the sidecar, compare against Settings.embed_model or the embed_model passed to load_index_from_storage, raise EmbedModelMismatchError on any drift. Do not fall back to the new embedder.

CONSEQUENCEundefined behavior

domain-constraint

常见问题

讨论 (0)

类型

📎附加 .md 文件（可选，≤500KB）

暂无讨论，成为第一个发言的人吧！

更新历史

v0.1.02026-04-25·贡献者： tangweigang-jpg

v0.1.0: 首次发布到 Doramagic.ai。基于 run-llama/llama_index 的 RAG 框架，中英双语 + 52 条 anti-pattern 约束（5 条 fatal）+ 3 条 FAQ。

v0.1.02026-04-25·贡献者： tangweigang-jpg

v0.1.0: 首次发布到 Doramagic.ai。基于 run-llama/llama_index 的 RAG 框架，中英双语 + 52 条 anti-pattern 约束（5 条 fatal）+ 3 条 FAQ。