# Pitfall Log / 踩坑日志

项目：microsoft/markitdown

摘要：发现 30 个潜在踩坑项，其中 4 个为 high/blocking；最高优先级：安装坑 - 来源证据：[Bug]: RuntimeWarning from pydub: "Couldn't find ffmpeg or avconv" on Linux。

## 1. 安装坑 · 来源证据：[Bug]: RuntimeWarning from pydub: "Couldn't find ffmpeg or avconv" on Linux

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[Bug]: RuntimeWarning from pydub: "Couldn't find ffmpeg or avconv" on Linux
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_f70b2e3ea5ed47418a4aeb9ef27230f9 | https://github.com/microsoft/markitdown/issues/1685 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 2. 运行坑 · 来源证据：Unrecognized Arguments Error in markitdown CLI for undocumented arguments

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个运行相关的待验证问题：Unrecognized Arguments Error in markitdown CLI for undocumented arguments
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_252ef0d45ac040688ffa066bc1b64ba0 | https://github.com/microsoft/markitdown/issues/1897 | 来源类型 github_issue 暴露的待验证使用条件。

## 3. 维护坑 · 来源证据：bug: DOCX math converter crashes when oMath element is missing in malformed equations

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：bug: DOCX math converter crashes when oMath element is missing in malformed equations
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_6e08b71ee29f46a98e6825a5d5b11e6e | https://github.com/microsoft/markitdown/issues/1979 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 4. 维护坑 · 来源证据：bug: DOCX math converter crashes with NotImplementedError on unknown functions

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：bug: DOCX math converter crashes with NotImplementedError on unknown functions
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_439f22f47a524773808819148caadca5 | https://github.com/microsoft/markitdown/issues/1982 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 5. 安装坑 · 失败模式：installation: Office Open XML: Invalid Files Return Success with Error Message Instead of Exception

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Office Open XML: Invalid Files Return Success with Error Message Instead of Exception
- 对用户的影响：Developers may fail before the first successful local run: Office Open XML: Invalid Files Return Success with Error Message Instead of Exception
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Office Open XML: Invalid Files Return Success with Error Message Instead of Exception. Context: Source discussion did not expose a precise runtime context.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_087a8a7b6538b2ce2b065ade73c555af | https://github.com/microsoft/markitdown/issues/1408 | Office Open XML: Invalid Files Return Success with Error Message Instead of Exception

## 6. 安装坑 · 失败模式：installation: Support for .doc extensions

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Support for .doc extensions
- 对用户的影响：Developers may fail before the first successful local run: Support for .doc extensions
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Support for .doc extensions. Context: Observed when using windows, linux
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_d5a467d012987779306cb5c50725275b | https://github.com/microsoft/markitdown/issues/23 | Support for .doc extensions

## 7. 安装坑 · 失败模式：installation: [Bug]: RuntimeWarning from pydub: "Couldn't find ffmpeg or avconv" on Linux

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: [Bug]: RuntimeWarning from pydub: "Couldn't find ffmpeg or avconv" on Linux
- 对用户的影响：Developers may fail before the first successful local run: [Bug]: RuntimeWarning from pydub: "Couldn't find ffmpeg or avconv" on Linux
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: [Bug]: RuntimeWarning from pydub: "Couldn't find ffmpeg or avconv" on Linux. Context: Observed when using python, windows, linux
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_1f9167a15a1eec72c8f79514f1b70b76 | https://github.com/microsoft/markitdown/issues/1685 | [Bug]: RuntimeWarning from pydub: "Couldn't find ffmpeg or avconv" on Linux

## 8. 安装坑 · 失败模式：installation: v0.1.0

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: v0.1.0
- 对用户的影响：Upgrade or migration may change expected behavior: v0.1.0
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: v0.1.0. Context: Observed when using python
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_release | fmev_1d5ae6ee21225356f45c36c20024dccd | https://github.com/microsoft/markitdown/releases/tag/v0.1.0 | v0.1.0

## 9. 安装坑 · 来源证据：Office Open XML: Invalid Files Return Success with Error Message Instead of Exception

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Office Open XML: Invalid Files Return Success with Error Message Instead of Exception
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_734e117518a3496eb3779e5f22b600b5 | https://github.com/microsoft/markitdown/issues/1408 | 来源类型 github_issue 暴露的待验证使用条件。

## 10. 安装坑 · 来源证据：bug: IpynbConverter.accepts() raises UnicodeDecodeError on non-ASCII files (French PDFs, etc.)

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：bug: IpynbConverter.accepts() raises UnicodeDecodeError on non-ASCII files (French PDFs, etc.)
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_77597bea6262485b9609d8fc5f50a69a | https://github.com/microsoft/markitdown/issues/1894 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 11. 配置坑 · 失败模式：configuration: Enhancement: Add MCP server support for document processing

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: Enhancement: Add MCP server support for document processing
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: Enhancement: Add MCP server support for document processing
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Enhancement: Add MCP server support for document processing. Context: Source discussion did not expose a precise runtime context.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_969d5f508051e086435b78736eae3e88 | https://github.com/microsoft/markitdown/issues/2004 | Enhancement: Add MCP server support for document processing

## 12. 配置坑 · 失败模式：configuration: v0.1.2

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: v0.1.2
- 对用户的影响：Upgrade or migration may change expected behavior: v0.1.2
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: v0.1.2. Context: Observed when using python
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_release | fmev_076605feea6e0b4830282709121d3c90 | https://github.com/microsoft/markitdown/releases/tag/v0.1.2 | v0.1.2

## 13. 配置坑 · 失败模式：configuration: v0.1.2a1

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: v0.1.2a1
- 对用户的影响：Upgrade or migration may change expected behavior: v0.1.2a1
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: v0.1.2a1. Context: Observed when using python
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_release | fmev_22fa2fa9d8ed93f594844ce5550fc4d8 | https://github.com/microsoft/markitdown/releases/tag/v0.1.2a1 | v0.1.2a1

## 14. 配置坑 · 来源证据：Enhancement: Add MCP server support for document processing

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：Enhancement: Add MCP server support for document processing
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_94fcd5bbf87541d1ab988bae7c501a95 | https://github.com/microsoft/markitdown/issues/2004 | 来源类型 github_issue 暴露的待验证使用条件。

## 15. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:888092115 | https://github.com/microsoft/markitdown | README/documentation is current enough for a first validation pass.

## 16. 运行坑 · 失败模式：runtime: bug: DOCX math converter crashes when oMath element is missing in malformed equations

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: bug: DOCX math converter crashes when oMath element is missing in malformed equations
- 对用户的影响：Developers may hit a documented source-backed failure mode: bug: DOCX math converter crashes when oMath element is missing in malformed equations
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: bug: DOCX math converter crashes when oMath element is missing in malformed equations. Context: Observed when using python
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_2d85aabe3c00f8d53d781ac03dd69f62 | https://github.com/microsoft/markitdown/issues/1979 | bug: DOCX math converter crashes when oMath element is missing in malformed equations

## 17. 运行坑 · 失败模式：runtime: bug: DOCX math converter crashes with NotImplementedError on unknown functions

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: bug: DOCX math converter crashes with NotImplementedError on unknown functions
- 对用户的影响：Developers may hit a documented source-backed failure mode: bug: DOCX math converter crashes with NotImplementedError on unknown functions
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: bug: DOCX math converter crashes with NotImplementedError on unknown functions. Context: Observed when using python
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_3ca154355492e590afae4917a8e9c7af | https://github.com/microsoft/markitdown/issues/1982 | bug: DOCX math converter crashes with NotImplementedError on unknown functions

## 18. 运行坑 · 失败模式：runtime: bug: IpynbConverter.accepts() raises UnicodeDecodeError on non-ASCII files (French PDFs, etc.)

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: bug: IpynbConverter.accepts() raises UnicodeDecodeError on non-ASCII files (French PDFs, etc.)
- 对用户的影响：Developers may hit a documented source-backed failure mode: bug: IpynbConverter.accepts() raises UnicodeDecodeError on non-ASCII files (French PDFs, etc.)
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: bug: IpynbConverter.accepts() raises UnicodeDecodeError on non-ASCII files (French PDFs, etc.). Context: Observed when using python, windows
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_2603b970a28eceb8da6246b000a927d3 | https://github.com/microsoft/markitdown/issues/1894 | bug: IpynbConverter.accepts() raises UnicodeDecodeError on non-ASCII files (French PDFs, etc.)

## 19. 运行坑 · 失败模式：runtime: v0.1.3

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: v0.1.3
- 对用户的影响：Upgrade or migration may change expected behavior: v0.1.3
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: v0.1.3. Context: Observed when using windows
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_release | fmev_994386694bb3b31fb731336f58573ff3 | https://github.com/microsoft/markitdown/releases/tag/v0.1.3 | v0.1.3

## 20. 运行坑 · 失败模式：runtime: v0.1.5

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: v0.1.5
- 对用户的影响：Upgrade or migration may change expected behavior: v0.1.5
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: v0.1.5. Context: Observed when using windows
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_release | fmev_38cc2743269efc75c24242abb0e2746c | https://github.com/microsoft/markitdown/releases/tag/v0.1.5 | v0.1.5

## 21. 运行坑 · 来源证据：Timeout needed

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个运行相关的待验证问题：Timeout needed
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_ba28a1cc5c004225b80d2ef380e51a77 | https://github.com/microsoft/markitdown/issues/2000 | 来源类型 github_issue 暴露的待验证使用条件。

## 22. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:888092115 | https://github.com/microsoft/markitdown | last_activity_observed missing

## 23. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:888092115 | https://github.com/microsoft/markitdown | no_demo; severity=medium

## 24. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:888092115 | https://github.com/microsoft/markitdown | no_demo; severity=medium

## 25. 能力坑 · 失败模式：conceptual: Unrecognized Arguments Error in markitdown CLI for undocumented arguments

- 严重度：low
- 证据强度：source_linked
- 发现：Developers should check this conceptual risk before relying on the project: Unrecognized Arguments Error in markitdown CLI for undocumented arguments
- 对用户的影响：Developers may hit a documented source-backed failure mode: Unrecognized Arguments Error in markitdown CLI for undocumented arguments
- 建议检查：复核 source-backed failure mode cluster，并把适用版本和验证路径写入资产。
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_d7ddfa04bce33d2ca53c58ce9f0265c0 | https://github.com/microsoft/markitdown/issues/1897 | Unrecognized Arguments Error in markitdown CLI for undocumented arguments

## 26. 运行坑 · 失败模式：performance: Timeout needed

- 严重度：low
- 证据强度：source_linked
- 发现：Developers should check this performance risk before relying on the project: Timeout needed
- 对用户的影响：Developers may hit a documented source-backed failure mode: Timeout needed
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Timeout needed. Context: Source discussion did not expose a precise runtime context.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_acdf8e881ef175760bcc59b92eae1aef | https://github.com/microsoft/markitdown/issues/2000 | Timeout needed

## 27. 运行坑 · 失败模式：performance: Version 0.1.6

- 严重度：low
- 证据强度：source_linked
- 发现：Developers should check this performance risk before relying on the project: Version 0.1.6
- 对用户的影响：Upgrade or migration may change expected behavior: Version 0.1.6
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Version 0.1.6. Context: Source discussion did not expose a precise runtime context.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_release | fmev_037f240b8fd9da8ecfc973a1f7eae18c | https://github.com/microsoft/markitdown/releases/tag/v0.1.6 | Version 0.1.6

## 28. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:888092115 | https://github.com/microsoft/markitdown | issue_or_pr_quality=unknown

## 29. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:888092115 | https://github.com/microsoft/markitdown | release_recency=unknown

## 30. 维护坑 · 失败模式：maintenance: Version 0.1.5b1

- 严重度：low
- 证据强度：source_linked
- 发现：Developers should check this maintenance risk before relying on the project: Version 0.1.5b1
- 对用户的影响：Upgrade or migration may change expected behavior: Version 0.1.5b1
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Version 0.1.5b1. Context: Source discussion did not expose a precise runtime context.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_release | fmev_fe15b901250727fa3263b3b5af451b94 | https://github.com/microsoft/markitdown/releases/tag/v0.1.5b1 | Version 0.1.5b1
