# Pitfall Log / 踩坑日志

项目：jina-ai/reader

摘要：发现 28 个潜在踩坑项，其中 2 个为 high/blocking；最高优先级：安全/权限坑 - 失败模式：security_permissions: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments。

## 1. 安全/权限坑 · 失败模式：security_permissions: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments

- 严重度：high
- 证据强度：source_linked
- 发现：Developers should check this security_permissions risk before relying on the project: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments
- 对用户的影响：Developers may expose sensitive permissions or credentials: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1253 | Server-Side Request Forgery via domain resolution bypass in self-hosted deployments

## 2. 安全/权限坑 · 失败模式：security_permissions: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per...

- 严重度：high
- 证据强度：source_linked
- 发现：Developers should check this security_permissions risk before relying on the project: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)
- 对用户的影响：Developers may expose sensitive permissions or credentials: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1252 | Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)

## 3. 安装坑 · 依赖 Docker 环境

- 严重度：medium
- 证据强度：runtime_trace
- 发现：安装/运行入口包含 Docker 命令：docker run --rm -p 3000:8081 ghcr.io/jina-ai/reader:oss # then: curl http://localhost:3000/https://example.com
- 对用户的影响：非工程用户可能没有 Docker，启动成本明显增加。
- 复现命令：`docker run --rm -p 3000:8081 ghcr.io/jina-ai/reader:oss # then: curl http://localhost:3000/https://example.com`
- 证据：identity.distribution | https://github.com/jina-ai/reader | docker run --rm -p 3000:8081 ghcr.io/jina-ai/reader:oss # then: curl http://localhost:3000/https://example.com

## 4. 安装坑 · 失败模式：installation: npm run build failed because shared files are not found

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: npm run build failed because shared files are not found
- 对用户的影响：Developers may fail before the first successful local run: npm run build failed because shared files are not found
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/3 | npm run build failed because shared files are not found

## 5. 安装坑 · 来源证据：npm run build failed because shared files are not found

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：npm run build failed because shared files are not found
- 对用户的影响：可能阻塞安装或首次运行。
- 证据：community_evidence:github | https://github.com/jina-ai/reader/issues/3 | 来源讨论提到 npm 相关条件，需在安装/试用前复核。

## 6. 安装坑 · 来源证据：support docker deployment

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：support docker deployment
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 证据：community_evidence:github | https://github.com/jina-ai/reader/issues/2 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 7. 配置坑 · 失败模式：configuration: Improve content extraction logic to handle dynamic and hidden elements

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: Improve content extraction logic to handle dynamic and hidden elements
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: Improve content extraction logic to handle dynamic and hidden elements
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1242 | Improve content extraction logic to handle dynamic and hidden elements

## 8. 配置坑 · 失败模式：configuration: Respect robots.txt and identify your system

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: Respect robots.txt and identify your system
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: Respect robots.txt and identify your system
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/4 | Respect robots.txt and identify your system

## 9. 配置坑 · 失败模式：configuration: support docker deployment

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: support docker deployment
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: support docker deployment
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/2 | support docker deployment

## 10. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 证据：capability.assumptions | https://github.com/jina-ai/reader | README/documentation is current enough for a first validation pass.

## 11. 运行坑 · 失败模式：runtime: Failed to go to

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: Failed to go to
- 对用户的影响：Developers may hit a documented source-backed failure mode: Failed to go to
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1118 | Failed to go to

## 12. 运行坑 · 失败模式：runtime: Reader doesn't extract any content from this page even though its quite simple?

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: Reader doesn't extract any content from this page even though its quite simple?
- 对用户的影响：Developers may hit a documented source-backed failure mode: Reader doesn't extract any content from this page even though its quite simple?
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/105 | Reader doesn't extract any content from this page even though its quite simple?

## 13. 运行坑 · 来源证据：Failed to go to

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个运行相关的待验证问题：Failed to go to
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 证据：community_evidence:github | https://github.com/jina-ai/reader/issues/1118 | 来源类型 github_issue 暴露的待验证使用条件。

## 14. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 证据：evidence.maintainer_signals | https://github.com/jina-ai/reader | last_activity_observed missing

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 证据：downstream_validation.risk_items | https://github.com/jina-ai/reader | no_demo; severity=medium

## 16. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 证据：risks.scoring_risks | https://github.com/jina-ai/reader | no_demo; severity=medium

## 17. 安全/权限坑 · 来源证据：Bug/Optimization: Reader Output size is larger than Raw HTML size

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Bug/Optimization: Reader Output size is larger than Raw HTML size
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 证据：community_evidence:github | https://github.com/jina-ai/reader/issues/1250 | 来源类型 github_issue 暴露的待验证使用条件。

## 18. 安全/权限坑 · 来源证据：Improve content extraction logic to handle dynamic and hidden elements

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Improve content extraction logic to handle dynamic and hidden elements
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 证据：community_evidence:github | https://github.com/jina-ai/reader/issues/1242 | 来源类型 github_issue 暴露的待验证使用条件。

## 19. 安全/权限坑 · 来源证据：Server-Side Request Forgery via domain resolution bypass in self-hosted deployments

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Server-Side Request Forgery via domain resolution bypass in self-hosted deployments
- 对用户的影响：可能阻塞安装或首次运行。
- 证据：community_evidence:github | https://github.com/jina-ai/reader/issues/1253 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 20. 安全/权限坑 · 来源证据：Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 证据：community_evidence:github | https://github.com/jina-ai/reader/issues/1252 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 21. 安全/权限坑 · 来源证据：support docker deployment

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：support docker deployment
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 证据：community_evidence:github | https://github.com/jina-ai/reader/issues/2 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 22. 能力坑 · 失败模式：capability: Bug/Optimization: Reader Output size is larger than Raw HTML size

- 严重度：low
- 证据强度：source_linked
- 发现：Developers should check this capability risk before relying on the project: Bug/Optimization: Reader Output size is larger than Raw HTML size
- 对用户的影响：Developers may hit a documented source-backed failure mode: Bug/Optimization: Reader Output size is larger than Raw HTML size
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1250 | Bug/Optimization: Reader Output size is larger than Raw HTML size

## 23. 能力坑 · 失败模式：capability: Extraction didn't work

- 严重度：low
- 证据强度：source_linked
- 发现：Developers should check this capability risk before relying on the project: Extraction didn't work
- 对用户的影响：Developers may hit a documented source-backed failure mode: Extraction didn't work
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1 | Extraction didn't work

## 24. 能力坑 · 失败模式：capability: Jina Reader 和 Search 被墙了

- 严重度：low
- 证据强度：source_linked
- 发现：Developers should check this capability risk before relying on the project: Jina Reader 和 Search 被墙了
- 对用户的影响：Developers may hit a documented source-backed failure mode: Jina Reader 和 Search 被墙了
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1237 | Jina Reader 和 Search 被墙了

## 25. 能力坑 · 失败模式：capability: Pile in reader format

- 严重度：low
- 证据强度：source_linked
- 发现：Developers should check this capability risk before relying on the project: Pile in reader format
- 对用户的影响：Developers may hit a documented source-backed failure mode: Pile in reader format
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/5 | Pile in reader format

## 26. 能力坑 · 失败模式：capability: В странном виде сайты открываются.

- 严重度：low
- 证据强度：source_linked
- 发现：Developers should check this capability risk before relying on the project: В странном виде сайты открываются.
- 对用户的影响：Developers may hit a documented source-backed failure mode: В странном виде сайты открываются.
- 证据：failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1251 | В странном виде сайты открываются.

## 27. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 证据：evidence.maintainer_signals | https://github.com/jina-ai/reader | issue_or_pr_quality=unknown

## 28. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 证据：evidence.maintainer_signals | https://github.com/jina-ai/reader | release_recency=unknown
