{
  "canonical_name": "vasylenko/markfetch",
  "compilation_id": "pack_8d2e4620b31444bb8fd36fadb9de2b1c",
  "created_at": "2026-05-15T08:31:11.494735+00:00",
  "created_by": "project-pack-compiler",
  "feedback": {
    "carrier_selection_notes": [
      "viable_asset_types=mcp_config, recipe, host_instruction, eval, preflight",
      "recommended_asset_types=mcp_config, recipe, host_instruction, eval, preflight"
    ],
    "evidence_delta": {
      "confirmed_claims": [
        "identity_anchor_present",
        "capability_and_host_targets_present",
        "install_path_declared_or_better"
      ],
      "missing_required_fields": [],
      "must_verify_forwarded": [
        "Run or inspect `npm i -g markfetch` in an isolated environment.",
        "Confirm the project exposes the claimed capability to at least one target host."
      ],
      "quickstart_execution_scope": "allowlisted_sandbox_smoke",
      "sandbox_command": "npm i -g markfetch",
      "sandbox_container_image": "node:22-slim",
      "sandbox_execution_backend": "docker",
      "sandbox_planner_decision": "deterministic_isolated_install",
      "sandbox_validation_id": "sbx_68d591c3193f451280c23122928ef2c6"
    },
    "feedback_event_type": "project_pack_compilation_feedback",
    "learning_candidate_reasons": [],
    "template_gaps": []
  },
  "identity": {
    "canonical_id": "project_f2a92ee4f0af5e4010add90875cdae99",
    "canonical_name": "vasylenko/markfetch",
    "homepage_url": null,
    "license": "unknown",
    "repo_url": "https://github.com/vasylenko/markfetch",
    "slug": "markfetch",
    "source_packet_id": "phit_a4fc82107707459d877159657cf79a72",
    "source_validation_id": "dval_340184719b7f4ddea815de0bc4647491"
  },
  "merchandising": {
    "best_for": "需要工具连接与集成能力，并使用 mcp_host的用户",
    "github_forks": 0,
    "github_stars": 0,
    "one_liner_en": "Tiny CLI and MCP server: fetch an URL -- return clean markdown. Built for AI agents.",
    "one_liner_zh": "Tiny CLI and MCP server: fetch an URL -- return clean markdown. Built for AI agents.",
    "primary_category": {
      "category_id": "tool-integrations",
      "confidence": "high",
      "name_en": "Tool Integrations",
      "name_zh": "工具连接与集成",
      "reason": "matched_keywords:mcp, server, github"
    },
    "target_user": "使用 mcp_host 等宿主 AI 的用户",
    "title_en": "markfetch",
    "title_zh": "markfetch 能力包",
    "visible_tags": [
      {
        "label_en": "Browser Agents",
        "label_zh": "浏览器 Agent",
        "source": "repo_evidence_project_characteristics",
        "tag_id": "product_domain-browser-agents",
        "type": "product_domain"
      },
      {
        "label_en": "Web Task Automation",
        "label_zh": "网页任务自动化",
        "source": "repo_evidence_project_characteristics",
        "tag_id": "user_job-web-task-automation",
        "type": "user_job"
      },
      {
        "label_en": "Browser Automation",
        "label_zh": "浏览器自动化",
        "source": "repo_evidence_project_characteristics",
        "tag_id": "core_capability-browser-automation",
        "type": "core_capability"
      },
      {
        "label_en": "Page Observation and Action Planning",
        "label_zh": "页面观察与动作规划",
        "source": "repo_evidence_project_characteristics",
        "tag_id": "workflow_pattern-page-observation-and-action-planning",
        "type": "workflow_pattern"
      },
      {
        "label_en": "Evaluation Suite",
        "label_zh": "评测体系",
        "source": "repo_evidence_project_characteristics",
        "tag_id": "selection_signal-evaluation-suite",
        "type": "selection_signal"
      }
    ]
  },
  "packet_id": "phit_a4fc82107707459d877159657cf79a72",
  "page_model": {
    "artifacts": {
      "artifact_slug": "markfetch",
      "files": [
        "PROJECT_PACK.json",
        "QUICK_START.md",
        "PROMPT_PREVIEW.md",
        "HUMAN_MANUAL.md",
        "AI_CONTEXT_PACK.md",
        "BOUNDARY_RISK_CARD.md",
        "PITFALL_LOG.md",
        "REPO_INSPECTION.json",
        "REPO_INSPECTION.md",
        "CAPABILITY_CONTRACT.json",
        "EVIDENCE_INDEX.json",
        "CLAIM_GRAPH.json"
      ],
      "required_files": [
        "PROJECT_PACK.json",
        "QUICK_START.md",
        "PROMPT_PREVIEW.md",
        "HUMAN_MANUAL.md",
        "AI_CONTEXT_PACK.md",
        "BOUNDARY_RISK_CARD.md",
        "PITFALL_LOG.md",
        "REPO_INSPECTION.json"
      ]
    },
    "detail": {
      "capability_source": "Project Hit Packet + DownstreamValidationResult",
      "commands": [
        {
          "command": "npm i -g markfetch",
          "label": "Node.js / npm · 官方安装入口",
          "source": "https://github.com/vasylenko/markfetch#readme",
          "verified": true
        }
      ],
      "display_tags": [
        "浏览器 Agent",
        "网页任务自动化",
        "浏览器自动化",
        "页面观察与动作规划",
        "评测体系"
      ],
      "eyebrow": "工具连接与集成",
      "glance": [
        {
          "body": "判断自己是不是目标用户。",
          "label": "最适合谁",
          "value": "需要工具连接与集成能力，并使用 mcp_host的用户"
        },
        {
          "body": "先理解能力边界，再决定是否继续。",
          "label": "核心价值",
          "value": "Tiny CLI and MCP server: fetch an URL -- return clean markdown. Built for AI agents."
        },
        {
          "body": "未完成验证前保持审慎。",
          "label": "继续前",
          "value": "publish to Doramagic.ai project surfaces"
        }
      ],
      "guardrail_source": "Boundary & Risk Card",
      "guardrails": [
        {
          "body": "Prompt Preview 只展示流程，不证明项目已安装或运行。",
          "label": "Check 1",
          "value": "不要把试用当真实运行"
        },
        {
          "body": "mcp_host",
          "label": "Check 2",
          "value": "确认宿主兼容"
        },
        {
          "body": "publish to Doramagic.ai project surfaces",
          "label": "Check 3",
          "value": "先隔离验证"
        }
      ],
      "mode": "mcp_config, recipe, host_instruction, eval, preflight",
      "pitfall_log": {
        "items": [
          {
            "body": "GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v0.4.1",
            "category": "安装坑",
            "evidence": [
              "community_evidence:github | cevd_749b65614f7b40e0b524f4e932cd4aca | https://github.com/vasylenko/markfetch/releases/tag/v0.4.1 | 来源讨论提到 node 相关条件，需在安装/试用前复核。"
            ],
            "severity": "medium",
            "suggested_check": "来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。",
            "title": "来源证据：v0.4.1",
            "user_impact": "可能增加新用户试用和生产接入成本。"
          },
          {
            "body": "README/documentation is current enough for a first validation pass.",
            "category": "能力坑",
            "evidence": [
              "capability.assumptions | github_repo:1234238440 | https://github.com/vasylenko/markfetch | README/documentation is current enough for a first validation pass."
            ],
            "severity": "medium",
            "suggested_check": "将假设转成下游验证清单。",
            "title": "能力判断依赖假设",
            "user_impact": "假设不成立时，用户拿不到承诺的能力。"
          },
          {
            "body": "未记录 last_activity_observed。",
            "category": "维护坑",
            "evidence": [
              "evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | last_activity_observed missing"
            ],
            "severity": "medium",
            "suggested_check": "补 GitHub 最近 commit、release、issue/PR 响应信号。",
            "title": "维护活跃度未知",
            "user_impact": "新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。"
          },
          {
            "body": "no_demo",
            "category": "安全/权限坑",
            "evidence": [
              "downstream_validation.risk_items | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium"
            ],
            "severity": "medium",
            "suggested_check": "进入安全/权限治理复核队列。",
            "title": "下游验证发现风险项",
            "user_impact": "下游已经要求复核，不能在页面中弱化。"
          },
          {
            "body": "no_demo",
            "category": "安全/权限坑",
            "evidence": [
              "risks.scoring_risks | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium"
            ],
            "severity": "medium",
            "suggested_check": "把风险写入边界卡，并确认是否需要人工复核。",
            "title": "存在评分风险",
            "user_impact": "风险会影响是否适合普通用户安装。"
          },
          {
            "body": "issue_or_pr_quality=unknown。",
            "category": "维护坑",
            "evidence": [
              "evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | issue_or_pr_quality=unknown"
            ],
            "severity": "low",
            "suggested_check": "抽样最近 issue/PR，判断是否长期无人处理。",
            "title": "issue/PR 响应质量未知",
            "user_impact": "用户无法判断遇到问题后是否有人维护。"
          },
          {
            "body": "release_recency=unknown。",
            "category": "维护坑",
            "evidence": [
              "evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | release_recency=unknown"
            ],
            "severity": "low",
            "suggested_check": "确认最近 release/tag 和 README 安装命令是否一致。",
            "title": "发布节奏不明确",
            "user_impact": "安装命令和文档可能落后于代码，用户踩坑概率升高。"
          }
        ],
        "source": "ProjectPitfallLog + ProjectHitPacket + validation + community signals",
        "summary": "发现 7 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：安装坑 - 来源证据：v0.4.1。",
        "title": "踩坑日志"
      },
      "snapshot": {
        "contributors": 1,
        "forks": 0,
        "license": "unknown",
        "note": "站点快照，非实时质量证明；用于开工前背景判断。",
        "stars": 0
      },
      "source_url": "https://github.com/vasylenko/markfetch",
      "steps": [
        {
          "body": "不安装项目，先体验能力节奏。",
          "code": "preview",
          "title": "先试 Prompt"
        },
        {
          "body": "理解输入、输出、失败模式和边界。",
          "code": "manual",
          "title": "读说明书"
        },
        {
          "body": "把上下文交给宿主 AI 继续工作。",
          "code": "context",
          "title": "带给 AI"
        },
        {
          "body": "进入主力环境前先完成安装入口与风险边界验证。",
          "code": "verify",
          "title": "沙箱验证"
        }
      ],
      "subtitle": "Tiny CLI and MCP server: fetch an URL -- return clean markdown. Built for AI agents.",
      "title": "markfetch 能力包",
      "trial_prompt": "# markfetch - Prompt Preview\n\n> 复制下面这段 Prompt 到你常用的 AI，先试一次，不需要安装。\n> 它的目标是让你直接体验这个项目的服务方式，而不是阅读项目介绍。\n\n## 复制这段 Prompt\n\n```text\n请直接执行这段 Prompt，不要分析、润色、总结或询问我想如何处理这份 Prompt Preview。\n\n你现在扮演 markfetch 的“安装前体验版”。\n这不是项目介绍、不是评价报告、不是 README 总结。你的任务是让我用最小成本体验它的核心服务。\n\n我的试用任务：我想用它完成一个真实的工具连接与集成任务。\n我常用的宿主 AI：MCP Client\n\n【体验目标】\n围绕我的真实任务，现场演示这个项目如何把输入转成 示例引导, 判断线索。重点是让我感受到工作方式，而不是给我项目背景。\n\n【业务流约束】\n- 你必须像一个正在提供服务的项目能力包，而不是像一个讲解员。\n- 每一轮只推进一个步骤；提出问题后必须停下来等我回答。\n- 每一步都必须让我感受到一个具体服务动作：澄清、整理、规划、检查、判断或收尾。\n- 每一步都要说明：当前目标、你需要我提供什么、我回答后你会产出什么。\n- 不要安装、不要运行命令、不要写代码、不要声称测试通过、不要声称已经修改文件。\n- 需要真实安装或宿主加载后才能验证的内容，必须明确说“这一步需要安装后验证”。\n- 如果我说“用示例继续”，你可以用虚构示例推进，但仍然不能声称真实执行。\n\n【可体验服务能力】\n- 安装前能力预览: Tiny CLI and MCP server: fetch an URL -- return clean markdown. Built for AI agents. 输入：用户任务, 当前 AI 对话上下文；输出：示例引导, 判断线索。\n\n【必须安装后才可验证的能力】\n- 命令行启动或安装流程: 项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 输入：终端环境, 包管理器, 项目依赖；输出：安装结果, 列表/更新/运行结果。\n\n【核心服务流】\n请严格按这个顺序带我体验。不要一次性输出完整流程：\n1. introduction：Introduction。围绕“Introduction”模拟一次用户任务，不展示安装或运行结果。\n2. quickstart：Quick Start Guide。围绕“Quick Start Guide”模拟一次用户任务，不展示安装或运行结果。\n3. processing-pipeline：Processing Pipeline。围绕“Processing Pipeline”模拟一次用户任务，不展示安装或运行结果。\n4. http-fingerprinting：HTTP/2 Fingerprinting。围绕“HTTP/2 Fingerprinting”模拟一次用户任务，不展示安装或运行结果。\n5. cli-usage：CLI Usage。围绕“CLI Usage”模拟一次用户任务，不展示安装或运行结果。\n\n【核心能力体验剧本】\n每一步都必须按“输入 -> 服务动作 -> 中间产物”执行。不要只说流程名：\n1. introduction\n输入：用户提供的“Introduction”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n2. quickstart\n输入：用户提供的“Quick Start Guide”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n3. processing-pipeline\n输入：用户提供的“Processing Pipeline”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n4. http-fingerprinting\n输入：用户提供的“HTTP/2 Fingerprinting”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n5. cli-usage\n输入：用户提供的“CLI Usage”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n【项目服务规则】\n这些规则决定你如何服务用户。不要解释规则本身，而要在每一步执行时遵守：\n- 先确认用户任务、输入材料和成功标准，再模拟项目能力。\n- 每一步都必须形成可检查的小产物，并等待用户确认后再继续。\n- 凡是需要安装、调用工具或访问外部服务的能力，都必须标记为安装后验证。\n\n【每一步的服务约束】\n- Step 1 / introduction：Step 1 必须围绕“Introduction”形成一个小中间产物，并等待用户确认。\n- Step 2 / quickstart：Step 2 必须围绕“Quick Start Guide”形成一个小中间产物，并等待用户确认。\n- Step 3 / processing-pipeline：Step 3 必须围绕“Processing Pipeline”形成一个小中间产物，并等待用户确认。\n- Step 4 / http-fingerprinting：Step 4 必须围绕“HTTP/2 Fingerprinting”形成一个小中间产物，并等待用户确认。\n- Step 5 / cli-usage：Step 5 必须围绕“CLI Usage”形成一个小中间产物，并等待用户确认。\n\n【边界与风险】\n- 不要声称已经安装、运行、调用 API、读写本地文件或完成真实任务。\n- 安装前预览只能展示工作方式，不能证明兼容性、性能或输出质量。\n- 涉及安装、插件加载、工具调用或外部服务的能力必须安装后验证。\n\n【可追溯依据】\n这些路径只用于你内部校验或在我追问“依据是什么”时简要引用。不要在首次回复主动展开：\n- https://github.com/vasylenko/markfetch\n- https://github.com/vasylenko/markfetch#readme\n- README.md\n- src/index.ts\n- package.json\n- src/core.ts\n- src/cli.ts\n\n【首次问题规则】\n- 首次三问必须先确认用户目标、成功标准和边界，不要提前进入工具、安装或实现细节。\n- 如果后续需要技术条件、文件路径或运行环境，必须等用户确认目标后再追问。\n\n首次回复必须只输出下面 4 个部分：\n1. 体验开始：用 1 句话说明你将带我体验 markfetch 的核心服务。\n2. 当前步骤：明确进入 Step 1，并说明这一步要解决什么。\n3. 你会如何服务我：说明你会先改变我完成任务的哪个动作。\n4. 只问我 3 个问题，然后停下等待回答。\n\n首次回复禁止输出：后续完整流程、证据清单、安装命令、项目评价、营销文案、已经安装或运行的说法。\n\nStep 1 / brainstorming 的二轮协议：\n- 我回答首次三问后，你仍然停留在 Step 1 / brainstorming，不要进入 Step 2。\n- 第二次回复必须产出 6 个部分：澄清后的任务定义、成功标准、边界条件、\n  2-3 个可选方案、每个方案的权衡、推荐方案。\n- 第二次回复最后必须问我是否确认推荐方案；只有我明确确认后，才能进入下一步。\n- 第二次回复禁止输出 git worktree、代码计划、测试文件、命令或真实执行结果。\n\n后续对话规则：\n- 我回答后，你先完成当前步骤的中间产物并等待确认；只有我确认后，才能进入下一步。\n- 每一步都要生成一个小的中间产物，例如澄清后的目标、计划草案、测试意图、验证清单或继续/停止判断。\n- 所有演示都写成“我会建议/我会引导/这一步会形成”，不要写成已经真实执行。\n- 不要声称已经测试通过、文件已修改、命令已运行或结果已产生。\n- 如果某个能力必须安装后验证，请直接说“这一步需要安装后验证”。\n- 如果证据不足，请明确说“证据不足”，不要补事实。\n```\n",
      "voices": [
        {
          "body": "来源平台：github。github/github_release: v0.4.1（https://github.com/vasylenko/markfetch/releases/tag/v0.4.1）。这些是项目级外部声音，不作为单独质量证明。",
          "items": [
            {
              "kind": "github_release",
              "source": "github",
              "title": "v0.4.1",
              "url": "https://github.com/vasylenko/markfetch/releases/tag/v0.4.1"
            }
          ],
          "status": "已收录 1 条来源",
          "title": "社区讨论"
        }
      ]
    },
    "homepage_card": {
      "category": "工具连接与集成",
      "desc": "Tiny CLI and MCP server: fetch an URL -- return clean markdown. Built for AI agents.",
      "effort": "安装已验证",
      "forks": 0,
      "icon": "link",
      "name": "markfetch 能力包",
      "risk": "需复核",
      "slug": "markfetch",
      "stars": 0,
      "tags": [
        "浏览器 Agent",
        "网页任务自动化",
        "浏览器自动化",
        "页面观察与动作规划",
        "评测体系"
      ],
      "thumb": "gray",
      "type": "MCP 配置"
    },
    "manual": {
      "markdown": "# https://github.com/vasylenko/markfetch 项目说明书\n\n生成时间：2026-05-15 08:07:16 UTC\n\n## 目录\n\n- [Introduction](#introduction)\n- [Quick Start Guide](#quickstart)\n- [Processing Pipeline](#processing-pipeline)\n- [HTTP/2 Fingerprinting](#http-fingerprinting)\n- [CLI Usage](#cli-usage)\n- [MCP Server Integration](#mcp-server)\n- [Environment Variables](#environment-variables)\n- [Write Sandbox Security](#write-sandbox)\n- [Error Handling](#error-handling)\n- [Development Guide](#development)\n\n<a id='introduction'></a>\n\n## Introduction\n\n### 相关页面\n\n相关主题：[Quick Start Guide](#quickstart), [Processing Pipeline](#processing-pipeline)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n</details>\n\n# Introduction\n\n## What is markfetch?\n\n**markfetch** is a Node.js tool that fetches public HTTP/S URLs and returns clean, readable markdown — indistinguishable from what a human would get by running \"Save as Markdown\" in a browser. It is designed to provide high-quality content extraction for language models, with a focus on producing output that LLM clients can actually consume reliably.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Core Design Philosophy\n\nmarkfetch is built around several key principles that differentiate it from generic fetching solutions:\n\n| Principle | Description |\n|-----------|-------------|\n| **Single-channel output** | Returns markdown in `content[0].text` only — no `structuredContent` that some LLM clients drop |\n| **Real-browser fingerprint** | Uses HTTP/2 transport with a coherent Chrome header set and `Sec-CH-UA-*` client hints |\n| **Reader-View extraction** | Leverages Mozilla's Readability library to extract the main article content |\n| **Zero-config defaults** | Works out of the box with sensible defaults |\n| **Deterministic errors** | 8 structured error codes for reliable error handling |\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Architecture Overview\n\nmarkfetch follows an adapter pattern with a unified core:\n\n```mermaid\ngraph TD\n    A[User / LLM Client] --> B[Adapter Layer]\n    B --> C{Invocation Mode}\n    C -->|CLI args| D[cli.ts]\n    C -->|MCP stdio| E[mcp.ts]\n    D --> F[core.ts - fetchMarkdown]\n    E --> F\n    F --> G[HTTP Fetch - undici]\n    G --> H[Readability Extraction]\n    H --> I[Turndown Conversion]\n    I --> J[Markdown Output]\n```\n\n### Core Components\n\n| Component | File | Responsibility |\n|-----------|------|----------------|\n| **Core Pipeline** | `src/core.ts` | URL fetching, HTML parsing, content extraction, markdown conversion, error throwing |\n| **CLI Adapter** | `src/cli.ts` | Command-line argument parsing, stdout/stderr output |\n| **MCP Adapter** | `src/mcp.ts` | Model Context Protocol stdio server, tool registration |\n| **Write Sandbox** | `src/sandbox.ts` | Path validation for file saves |\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts), [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts), [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n## Two Operating Modes\n\n### CLI Mode\n\nThe command-line interface accepts a URL and outputs markdown to stdout:\n\n```bash\nmarkfetch https://en.wikipedia.org/wiki/Markdown\n```\n\nOptions include:\n- `-o, --output <path>` — Save markdown to a file\n- `-V, --version` — Print version\n- `-h, --help` — Print usage\n\nThe CLI respects the same environment variables as the MCP mode and resolves relative output paths against the current working directory.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md), [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n### MCP Mode\n\nThe Model Context Protocol server provides a single tool `fetch_markdown(url, savePath?)` for integration with LLM clients like Claude Code, Cursor, or Goose:\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"]\n    }\n  }\n}\n```\n\nThe MCP mode has additional security features:\n- **Write sandbox**: File saves are restricted to allowed write roots\n- **Lazy loading**: The CLI adapter is never loaded in MCP mode, ensuring `console.log` is never reachable\n\n资料来源：[src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts), [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Content Extraction Pipeline\n\nThe markdown conversion process involves several stages:\n\n```mermaid\ngraph LR\n    A[HTML Response] --> B[Decode Encoded Tags]\n    B --> C[Ensure Base Href]\n    C --> D[Rewrite for Readability]\n    D --> E[Readability Parse]\n    E --> F[Turndown Convert]\n    F --> G[Prune Empty Headings]\n    G --> H[Clean Markdown]\n```\n\n### Extraction Details\n\n1. **Encoded Tag Decoding**: Handles HTML entities like `&lt;code&gt;` in code blocks\n2. **Base Href Injection**: Ensures relative URLs become absolute using the canonical URL\n3. **Pre-processing Rewrites**: Handles footnotes, `<details>` elements, and MediaWiki-specific structures\n4. **Readability Parsing**: Extracts main article content using Mozilla Readability with `keepClasses: true` to preserve language hints on code blocks\n5. **Markdown Conversion**: Uses Turndown with a custom escape function to avoid noisy backslash escapes\n6. **Heading Pruning**: Removes empty headings left by stripped interactive widgets\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n## Error Handling\n\nmarkfetch provides 8 deterministic error codes:\n\n| Error Code | Meaning |\n|------------|---------|\n| `network_error` | DNS, TCP, TLS failure, or unexpected fetcher error |\n| `http_error` | Non-2xx status from upstream |\n| `timeout` | Request exceeded `MARKFETCH_TIMEOUT_MS` |\n| `unsupported_content_type` | Response is not `text/html` or `application/xhtml+xml` |\n| `extraction_failed` | Readability found no article content (typical for SPAs) |\n| `too_large` | Content exceeded `MARKFETCH_MAX_BYTES` |\n| `save_failed` | File write failed (permission, missing directory) |\n| `save_forbidden` | `savePath` resolves outside allowed write roots |\n\nAll errors are thrown uniformly from `core.ts` as `MarkfetchError` and caught by adapters for translation to their respective output formats.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Configuration\n\n| Variable | Default | Purpose |\n|----------|---------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in milliseconds |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Browser fingerprint; must be Chrome UA |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | `os.tmpdir()` + `process.cwd()` | MCP-only; allowed file save paths |\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## What markfetch Is Not\n\nUnderstanding the boundaries helps set correct expectations:\n\n| Limitation | Explanation |\n|------------|-------------|\n| **Not a crawler** | One URL in, one document out. No recursion, `robots.txt` parsing, or rate limiting. |\n| **Not authenticated** | Anonymous fetch only. Pages behind login walls return public content or `http_error`. |\n| **Not a JS renderer** | Pure client-rendered SPAs with no static HTML return `extraction_failed`. SPAs with server-rendered content will extract what they ship. |\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Requirements\n\n- **Node.js ≥ 24**\n- **npm** for installation\n\n## Quick Start\n\n```bash\n# Install globally\nnpm i -g markfetch\n\n# Fetch a URL\nmarkfetch https://en.wikipedia.org/wiki/Markdown\n\n# Save to file\nmarkfetch https://example.com/article -o output.md\n```\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Version History\n\n| Version | Date | Key Changes |\n|---------|------|-------------|\n| 0.6.0 | 2026-05-13 | Write sandbox, `save_forbidden` error, CI matrix expansion |\n| 0.5.0 | 2026-05-12 | CLI mode with lazy-loading dispatcher |\n| 0.4.0 | 2026-05-10 | MCP server with single `fetch_markdown` tool |\n| 0.4.1 | 2026-05-11 | Bug fixes and documentation improvements |\n\n资料来源：[CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n\n---\n\n<a id='quickstart'></a>\n\n## Quick Start Guide\n\n### 相关页面\n\n相关主题：[Introduction](#introduction), [CLI Usage](#cli-usage), [MCP Server Integration](#mcp-server)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n</details>\n\n# Quick Start Guide\n\nmarkfetch is a tool that fetches URLs and returns clean markdown output. It operates as both a CLI command and an MCP (Model Context Protocol) server, making it suitable for AI agents like Claude Code, Codex, and Gemini CLI.\n\n## Installation\n\n### Prerequisites\n\n- Node.js ≥ 24 资料来源：[package.json:8]()\n\n### CLI Installation (Global)\n\n```bash\nnpm i -g markfetch\n```\n\nAfter installation, the `markfetch` command is available globally. 资料来源：[README.md:38]()\n\n### CLI Installation (npx)\n\nFor one-off usage without global installation:\n\n```bash\nnpx -y markfetch <url>\n```\n\n### MCP Server Setup\n\nAdd markfetch to your MCP client configuration. The setup varies by client.\n\n#### Claude Code\n\n```bash\nclaude mcp add --scope user markfetch -- npx -y markfetch\n```\n\n#### Codex\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"]\n    }\n  }\n}\n```\n\n#### Gemini CLI\n\n```bash\ngemini mcp add -s user markfetch npx -y markfetch\n```\n\n#### Cursor / Goose / Other stdio-MCP Clients\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"]\n    }\n  }\n}\n```\n\n资料来源：[README.md:46-69]()\n\n## CLI Usage\n\n### Basic Fetch\n\n```bash\nmarkfetch <url>\n```\n\nThe fetched markdown is printed to stdout. 资料来源：[src/cli.ts:18]()\n\n### Save to File\n\n```bash\nmarkfetch <url> -o <path>\n```\n\nUse `-o` or `--output` to save markdown to a file. Relative paths resolve against the current working directory. 资料来源：[src/cli.ts:12-15]()\n\nExample:\n\n```bash\nmarkfetch https://en.wikipedia.org/wiki/Markdown -o output.md\n```\n\n### Help and Version\n\n```bash\nmarkfetch --help\nmarkfetch --version\n```\n\n## MCP Tool Usage\n\n### Tool Name\n\n`fetch_markdown`\n\n### Parameters\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `url` | string | Yes | Absolute http(s) URL to fetch. The server follows redirects automatically. No authentication headers, cookies, or session state are sent. |\n| `savePath` | string | No | Absolute filesystem path. When provided, the fetched markdown is written to this path instead of returned in the response. |\n\n资料来源：[src/mcp.ts:22-33]()\n\n### Return Value\n\nThe tool returns markdown content in `content[0].text`. No `structuredContent` field is used — this ensures compatibility with MCP clients that forward only `structuredContent` to the model. 资料来源：[README.md:18-21]()\n\n## Environment Configuration\n\n| Variable | Default | Purpose |\n|----------|---------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in milliseconds |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown (5MB) |\n| `MARKFETCH_USER_AGENT` | Pinned Chrome 130 string | Override the User-Agent header. Must be a Chrome UA string. |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | `os.tmpdir()` + `process.cwd()` | MCP-only. Colon-delimited (POSIX) or semicolon-delimited (Windows) list of absolute paths permitted for `savePath` writes. |\n\n资料来源：[README.md:99-103]()\n\n### Passing Environment Variables to MCP\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_TIMEOUT_MS\": \"60000\"\n      }\n    }\n  }\n}\n```\n\n## Error Handling\n\nErrors are returned with deterministic codes in the format `[code] message`:\n\n| Code | Meaning |\n|------|---------|\n| `network_error` | DNS, TCP, or TLS failure |\n| `http_error` | Upstream returned a non-2xx status |\n| `timeout` | Request exceeded `MARKFETCH_TIMEOUT_MS` |\n| `unsupported_content_type` | Response was not `text/html` or `application/xhtml+xml` |\n| `extraction_failed` | Readability found no article content (typical for pure client-rendered SPAs) |\n| `too_large` | Response body or extracted markdown exceeded `MARKFETCH_MAX_BYTES` |\n| `save_failed` | `writeFile` failed (missing directory, permission denied) |\n| `save_forbidden` | `savePath` resolves outside the allowed write roots |\n\nErrors go to stderr with non-zero exit status in CLI mode. 资料来源：[README.md:72-85]()\n\n## Quick Workflow\n\n```mermaid\ngraph TD\n    A[Start markfetch] --> B{Arguments provided?}\n    B -->|Yes, URL argument| C[CLI Mode]\n    B -->|No arguments| D[MCP Server Mode]\n    C --> E[Fetch URL]\n    D --> F[Wait for MCP request]\n    E --> G{Output path specified?}\n    F --> H[Receive fetch_markdown request]\n    G -->|No| I[Print to stdout]\n    G -->|Yes, -o path| J[Write to file]\n    H --> I\n    J --> K[Return confirmation]\n    I --> L[Return markdown content]\n    K --> L\n```\n\n## Use Cases\n\n| Use Case | Recommended Mode | Command/Config |\n|----------|-------------------|----------------|\n| One-time URL fetch in shell | CLI | `markfetch <url>` |\n| Batch processing with shell scripts | CLI + `-o` | `markfetch <url> -o out.md` |\n| AI agent web content retrieval | MCP | Configure in client |\n| Large document bypass inline limits | MCP + `savePath` | Set `savePath` to local file |\n\n## Limitations\n\n- **Not a crawler**: No recursion, no `robots.txt` parsing. One URL in, one document out. 资料来源：[README.md:89-91]()\n- **Not authenticated**: Anonymous fetch only. Pages behind login walls return whatever the public response is. 资料来源：[README.md:93-95]()\n- **Not a JS renderer**: Pure client-rendered SPAs with no static HTML return `extraction_failed`. 资料来源：[README.md:97-99]()\n\n---\n\n<a id='processing-pipeline'></a>\n\n## Processing Pipeline\n\n### 相关页面\n\n相关主题：[Introduction](#introduction), [HTTP/2 Fingerprinting](#http-fingerprinting), [Error Handling](#error-handling)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n</details>\n\n# Processing Pipeline\n\n## Overview\n\nThe Processing Pipeline is the core data flow engine in markfetch. It transforms raw HTML fetched from a URL into clean, readable markdown suitable for consumption by AI agents and language models. The pipeline is intentionally single-purpose — one URL in, one markdown document out — with no recursion, pagination, or client-side JavaScript rendering.\n\nThe pipeline operates identically whether invoked via CLI or MCP adapter, ensuring consistent behavior across both interfaces.\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n## Architecture\n\nThe pipeline is composed of three primary stages executed sequentially:\n\n```mermaid\ngraph TD\n    A[URL Input] --> B[HTTP Fetch]\n    B --> C{HTML Valid?}\n    C -->|No| D[Error: network_error / http_error / timeout]\n    C -->|Yes| E[Content-Type Check]\n    E -->|Non-HTML| F[Error: unsupported_content_type]\n    E -->|HTML| G[Extract Article]\n    G -->|No Content| H[Error: extraction_failed]\n    G -->|Extracted| I[Convert to Markdown]\n    I --> J{Size Check}\n    J -->|Exceeds Limit| K[Error: too_large]\n    J -->|Valid| L{Save Path?}\n    L -->|Yes| M[Write to File / Error: save_forbidden / save_failed]\n    L -->|No| N[Return Markdown]\n```\n\nEach stage performs validation and may abort with a deterministic error code, ensuring failures are predictable and actionable.\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n## Stage 1: HTTP Fetch\n\nThe fetch stage retrieves raw HTML from the target URL using Node.js `fetch` with a real-browser fingerprint.\n\n### Transport Configuration\n\n| Setting | Value | Purpose |\n|---------|-------|---------|\n| Protocol | HTTP/2 | Modern web fingerprint |\n| User-Agent | Chrome 130 (pinned) | Realistic browser identification |\n| Client Hints | Sec-CH-UA-* headers | Derived from User-Agent at startup |\n| Timeout | `MARKFETCH_TIMEOUT_MS` (default: 30000ms) | Per-request budget |\n\nThe User-Agent string is validated at startup. Non-Chrome strings fail fast to prevent fingerprint inconsistencies that could trigger bot detection.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### Error Conditions\n\n| Code | Trigger |\n|------|---------|\n| `network_error` | DNS failure, TCP failure, TLS error, unexpected fetcher error |\n| `http_error` | Non-2xx HTTP status code |\n| `timeout` | Response exceeds `MARKFETCH_TIMEOUT_MS` |\n\nRedirects are followed automatically by the underlying HTTP client.\n\n## Stage 2: Article Extraction\n\nArticle extraction identifies and isolates the main content from the fetched HTML, stripping navigation, sidebars, footers, and other boilerplate.\n\n### Technology Stack\n\n| Component | Library | Purpose |\n|-----------|---------|---------|\n| HTML Parser | `linkedom` | Parses HTML into a DOM-like structure |\n| Extraction | `readability` (Mozilla) | Identifies main article content |\n| Configuration | `keepClasses: true` | Preserves code block language hints |\n\nThe `linkedom` parser is chosen over native `DOMParser` to ensure consistent behavior across Node.js versions and environments.\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n### Pre-Extraction Rewrites\n\nBefore Readability processes the document, the pipeline applies targeted HTML rewrites to normalize content and improve extraction quality:\n\n```typescript\nfunction rewriteForReadability(document: Document): void {\n  // Normalize code blocks (pre and code elements)\n  // Convert aside elements to sections\n  // Expand details/summary elements\n  // Flatten MediaWiki heading wrappers\n}\n```\n\nSpecific transformations include:\n\n| Transform | Target | Action |\n|-----------|--------|--------|\n| Code block normalization | `<pre>`, `<code>` | Standardize encoding artifacts |\n| Base href injection | `<head>` / `<html>` | Ensure absolute URLs after redirects |\n| Aside conversion | `<aside>` with footnote roles | Convert to `<section>` |\n| Details expansion | `<details>`, `<summary>` | Inline content |\n| Heading unwrapping | `div.mw-heading` | Remove MediaWiki wrappers |\n\n### Base Href Handling\n\nReadability and linkedom leave relative URLs unresolved unless a `<base>` element exists. The pipeline injects the post-redirect canonical URL to ensure all hrefs and srcs resolve correctly:\n\n```typescript\nfunction ensureBaseHref(html: string, url: string): string {\n  const safeUrl = url.replaceAll(\"&\", \"&amp;\").replaceAll('\"', \"&quot;\");\n  const stripped = html.replaceAll(/<base\\s[^>]*>/gi, \"\");\n  // Inject <base href=\"...\"> into <head> or <html>\n}\n```\n\n### Error Conditions\n\n| Code | Trigger |\n|------|---------|\n| `unsupported_content_type` | Response is not `text/html` or `application/xhtml+xml` |\n| `extraction_failed` | Readability returned empty content (typical for client-rendered SPAs) |\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n## Stage 3: Markdown Conversion\n\nThe conversion stage transforms extracted HTML into clean markdown using Turndown with custom rules.\n\n### Technology Stack\n\n| Component | Library | Notes |\n|-----------|---------|-------|\n| HTML-to-MD | `turndown` | Configured with GFM rules |\n| Code fences | Custom rule | Preserves `class=\"language-X\"` as hint |\n\n### Custom Escape Behavior\n\nTurndown's default escape mechanism inserts backslashes before certain character sequences that might be misinterpreted as markdown. The pipeline removes two categories of unnecessary escapes:\n\n| Pattern | Before | After | Rationale |\n|---------|--------|-------|-----------|\n| Intraword underscores | `\\_` | `_` | Intraword underscores are valid |\n| Mid-line dash/equals | `\\-X`, `\\=X` | `-X`, `=X` | Not list markers or underlines when alphanumeric follows |\n\nThis prevents the output from containing visible escape characters that don't affect rendering.\n\n### Empty Heading Pruning\n\nThe conversion includes iterative pruning of empty headings — headings immediately followed by another heading with no body content. This commonly occurs when Readability strips interactive widgets (browser-compat tables, spec diagrams) but leaves the surrounding heading structure.\n\n### Title Handling\n\n| Condition | Output |\n|-----------|--------|\n| Content starts with `<h1>` | Use content heading, no duplicate |\n| Content lacks heading | Prepend `# {title}` from Readability |\n\n### Output Format\n\n```markdown\n# Page Title (if not already in content)\n\nArticle body with clean markdown conversion...\n```\n\n## Stage 4: Size Validation and Output\n\n### Size Limits\n\n| Limit | Environment Variable | Default |\n|-------|---------------------|---------|\n| Response body | `MARKFETCH_MAX_BYTES` | 5,000,000 bytes |\n| Extracted markdown | Same variable | Same default |\n\nThe pipeline checks both the raw HTTP response size and the final markdown size against this cap.\n\n### Error Conditions\n\n| Code | Trigger |\n|------|---------|\n| `too_large` | Body or markdown exceeds `MARKFETCH_MAX_BYTES` |\n\n### Output Routing\n\n| Mode | Destination | Behavior |\n|------|-------------|----------|\n| No `savePath` | Return value | `markdown` field contains content |\n| `savePath` (MCP) | File system | `savedTo` field contains path |\n| `savePath` (CLI) | File system | Confirmation to stdout |\n\n## Write Sandbox (MCP Only)\n\nWhen used as an MCP tool with a `savePath` parameter, writes are confined to an allowed set of root directories.\n\n### Default Roots\n\n| Platform | Roots |\n|----------|-------|\n| POSIX | `os.tmpdir()`, `process.cwd()` |\n| Windows | Same, case-insensitive comparison |\n\n### Configuration\n\n`MARKFETFETCH_ALLOWED_WRITE_ROOTS` overrides the defaults entirely. Paths use platform delimiters:\n\n| Platform | Delimiter | Example |\n|----------|-----------|---------|\n| POSIX | `:` | `/Users/me/out:/tmp` |\n| Windows | `;` | `C:\\Users\\me\\out;C:\\Temp` |\n\n### Error Conditions\n\n| Code | Trigger |\n|------|---------|\n| `save_forbidden` | `savePath` resolves outside allowed roots |\n| `save_failed` | `writeFile` failed (permissions, missing directory) |\n\nThe sandbox applies only to MCP mode. The CLI has no restrictions — the human at the shell is the security boundary.\n\n资料来源：[src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n\n## Error Codes Reference\n\nThe pipeline returns exactly eight deterministic error codes:\n\n| Code | Stage | Description |\n|------|-------|-------------|\n| `network_error` | Fetch | DNS/TCP/TLS failure |\n| `http_error` | Fetch | Non-2xx status |\n| `timeout` | Fetch | Exceeded timeout budget |\n| `unsupported_content_type` | Fetch | Not HTML/XHTML |\n| `extraction_failed` | Extract | Readability found no content |\n| `too_large` | Convert/Validate | Exceeded size cap |\n| `save_forbidden` | Output | Path outside sandbox |\n| `save_failed` | Output | File write failed |\n\nAll errors use the format `[code] message` for easy parsing by consuming tools.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Data Flow Summary\n\n```mermaid\ngraph LR\n    A[URL] --> B[HTTP Fetch]\n    B --> C{HTML?}\n    C -->|Yes| D[Readability]\n    C -->|No| E[Error]\n    D --> F[HTML Rewrites]\n    F --> G[Extract Content]\n    G --> H[Turndown]\n    H --> I[Size Check]\n    I -->|OK| J[Output]\n    I -->|Large| K[Error]\n    J --> L{savePath?}\n    L -->|No| M[Return Markdown]\n    L -->|Yes| N[Write File]\n```\n\n## Pipeline Entry Points\n\n### CLI Adapter\n\nThe CLI adapter (`src/cli.ts`) parses arguments and delegates to the core pipeline:\n\n```typescript\nconst { markdown, bytes, savedTo } = await fetchMarkdown({\n  url,\n  savePath: resolve(process.cwd(), options.output)\n});\n```\n\nOutput behavior:\n- With `-o`: prints `Saved N bytes to <path>` to stdout\n- Without `-o`: writes raw markdown to stdout via `process.stdout.write`\n\nErrors print to stderr with `[code] message` format.\n\n### MCP Adapter\n\nThe MCP adapter (`src/mcp.ts`) registers the `fetch_markdown` tool and calls the core pipeline:\n\n```typescript\nserver.registerTool(\"fetch_markdown\", {\n  description: \"Fetch a single public HTTP/S URL...\",\n  inputSchema: {\n    url: z.string().url(),\n    savePath: z.string().refine(isAbsolute).optional()\n  }\n});\n```\n\nOutput is always returned via `content[0].text`, never `structuredContent`, ensuring compatibility with clients that only forward `content[]`.\n\n资料来源：[src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n资料来源：[src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n## Configuration Options\n\n| Variable | Default | Applies To | Purpose |\n|----------|---------|------------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Both | Per-request timeout |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Both | Size cap |\n| `MARKFETCH_USER_AGENT` | Chrome 130 | Both | Browser fingerprint |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | tmpdir + cwd | MCP only | Write sandbox roots |\n\nAll variables are validated at startup with fail-fast behavior — invalid values terminate the process immediately with a stderr message.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Dependencies\n\n| Package | Version | Role |\n|---------|---------|------|\n| `linkedom` | runtime | HTML parsing |\n| `readability` | runtime | Article extraction |\n| `turndown` | runtime | HTML-to-markdown |\n| `turndown-plugin-gfm` | runtime | GitHub Flavored Markdown |\n| `commander` | runtime | CLI argument parsing |\n| `@modelcontextprotocol/sdk` | runtime | MCP server framework |\n\nNode.js ≥ 24 is required for native `fetch` and `fetch` headers support.\n\n资料来源：[package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n\n---\n\n<a id='http-fingerprinting'></a>\n\n## HTTP/2 Fingerprinting\n\n### 相关页面\n\n相关主题：[Processing Pipeline](#processing-pipeline), [Environment Variables](#environment-variables)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n</details>\n\n# HTTP/2 Fingerprinting\n\n## Overview\n\nHTTP/2 Fingerprinting is a technique used by markfetch to mimic real browser traffic when fetching web pages. Instead of making requests that appear to come from a typical HTTP library (like curl or a basic fetch implementation), markfetch generates HTTP/2 requests with headers and client hints that closely match those of an actual Chrome browser session.\n\nThis approach serves two critical purposes:\n\n1. **Bypass anti-bot measures**: Many websites employ fingerprinting techniques to detect and block automated scrapers. By presenting headers identical to a genuine Chrome browser, markfetch avoids triggering these defenses.\n2. **Access SEO-rendered content**: Sites that serve different content to bots vs. browsers will return the full article content when markfetch requests arrive with Chrome-like fingerprints.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Architecture\n\n```mermaid\ngraph TD\n    A[URL Request] --> B{Adapter Type?}\n    B -->|MCP| C[src/mcp.ts]\n    B -->|CLI| D[src/cli.ts]\n    C --> E[src/core.ts - fetchMarkdown]\n    D --> E\n    E --> F[Undici Dispatcher]\n    F --> G[HTTP/2 Transport]\n    G --> H[Sec-CH-UA-* Client Hints]\n    G --> I[Chrome Headers]\n    H --> J[Upstream Server]\n    I --> J\n    J --> K[HTML Response]\n    K --> L[Readability Parser]\n    L --> M[Markdown Output]\n```\n\n## Implementation Details\n\n### User Agent String\n\nThe default user agent is a pinned Chrome 130 string. This can be overridden via the `MARKFETCH_USER_AGENT` environment variable, but must be a valid Chrome UA string.\n\n| Environment Variable | Default Value | Purpose |\n|---|---|---|\n| `MARKFETCH_USER_AGENT` | Pinned Chrome 130 string | Override the browser fingerprint UA |\n\n**Constraint**: The UA string must be a Chrome browser UA. Non-Chrome strings fail fast at startup because `Sec-CH-UA-*` client hints are derived from the UA at initialization time.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### Client Hints Generation\n\nWhen the server starts, markfetch parses the `MARKFETCH_USER_AGENT` value and derives `Sec-CH-UA-*` client hint headers from it. These hints are sent with every HTTP/2 request and include:\n\n- `Sec-CH-UA` — Browser brand and version\n- `Sec-CH-UA-Mobile` — Mobile indicator\n- `Sec-CH-UA-Platform` — Operating system\n\n```mermaid\ngraph LR\n    A[MARKFETCH_USER_AGENT<br/>Chrome 130] --> B[Startup<br/>Initialization]\n    B --> C[Sec-CH-UA Header<br/>Derived Value]\n    B --> D[Sec-CH-UA-Mobile<br/>Derived Value]\n    B --> E[Sec-CH-UA-Platform<br/>Derived Value]\n    C --> F[Every HTTP/2<br/>Request]\n    D --> F\n    E --> F\n```\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### HTTP/2 Transport\n\nMarkfetch uses the undici HTTP client library with HTTP/2 protocol support. The HTTP/2 transport is selected automatically by undici when the server supports it, enabling:\n\n- Multiplexed requests over a single connection\n- Header compression\n- Server push capabilities\n\nThe combination of HTTP/2 transport + coherent Chrome header set creates a fingerprint that is indistinguishable from a human browsing with Chrome DevTools open.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### Request Flow\n\n```mermaid\nsequenceDiagram\n    participant Client\n    participant Markfetch\n    participant Undici\n    participant Server\n\n    Client->>Markfetch: fetch_markdown(url)\n    Markfetch->>Markfetch: Validate MARKFETCH_USER_AGENT\n    Markfetch->>Undici: Dispatch with Chrome headers\n    Undici->>Server: HTTP/2 CONNECT<br/>Sec-CH-UA: \"Chromium\"\n    Undici->>Server: Sec-CH-UA-Mobile: ?U\n    Undici->>Server: Sec-CH-UA-Platform: \"Windows\"\n    Undici->>Server: GET /path HTTP/2\n    Server->>Undici: HTTP/2 200 OK<br/>text/html\n    Undici->>Markfetch: HTML Content\n    Markfetch->>Markfetch: Apply Readability\n    Markfetch->>Markfetch: Convert to Markdown\n    Markfetch->>Client: Clean Markdown\n```\n\n## Configuration\n\n### Environment Variables\n\n| Variable | Default | Purpose |\n|---|---|---|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in milliseconds |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown |\n| `MARKFETCH_USER_AGENT` | Pinned Chrome 130 | Browser fingerprint override |\n\n### Validation\n\nAll environment variables are validated at startup. Invalid values cause the process to fail fast on stderr with descriptive error messages, rather than producing confusing per-request errors.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Integration Points\n\n### MCP Adapter\n\nThe MCP server (`src/mcp.ts`) uses the core fetch pipeline which includes the HTTP/2 fingerprinting. The tool description explicitly documents this behavior:\n\n> Fetch a single public HTTP/S URL and return its main article content as clean markdown. Best for articles, documentation, blog posts, news, and reference pages. Non-HTML responses return `unsupported_content_type`.\n\n资料来源：[src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n### CLI Adapter\n\nThe CLI adapter (`src/cli.ts`) also uses the same core fetch pipeline, ensuring consistent HTTP/2 fingerprinting behavior whether invoked via MCP or command line:\n\n```bash\nmarkfetch https://en.wikipedia.org/wiki/Markdown\n```\n\n资料来源：[src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Version History\n\n| Version | Date | Change |\n|---|---|---|\n| 0.4.0 | 2026-05-10 | HTTP/2 fingerprinting feature added with Sec-CH-UA-* client hints |\n| 0.5.0 | 2026-05-12 | CLI mode added with same fingerprinting behavior |\n| 0.6.0 | Current | Enhanced write sandbox and validation |\n\n资料来源：[CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n\n## Limitations\n\n### SPA Handling\n\nPure client-rendered Single Page Applications (SPAs) with no static HTML content return `extraction_failed`. Sites that ship server-rendered or SEO-prerendered HTML will extract whatever static content they expose, including when accessed with Chrome fingerprints.\n\n### Authentication\n\nMarkfetch performs anonymous fetches only — no cookie jar, no auth headers, no session reuse. Pages behind login walls return whatever the public response is, usually surfaced as `http_error`.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Security Considerations\n\nThe HTTP/2 fingerprinting approach makes requests appear legitimate, which raises responsibility concerns. The documentation explicitly states:\n\n> Use it on URLs whose targets you have permission to fetch, and respect the terms of service of any site you query. The maintainer assumes no liability for misuse.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n---\n\n<a id='cli-usage'></a>\n\n## CLI Usage\n\n### 相关页面\n\n相关主题：[Quick Start Guide](#quickstart), [MCP Server Integration](#mcp-server), [Write Sandbox Security](#write-sandbox)\n\n<details>\n<summary>Relevant Source Files</summary>\n\nThe following source files were used to generate this page:\n\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/index.ts](https://github.com/vasylenko/markfetch/blob/main/src/index.ts)\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n</details>\n\n# CLI Usage\n\nThe markfetch CLI provides a command-line interface for fetching URLs and converting their content to clean markdown. It operates as one of two execution surfaces—the other being the MCP (Model Context Protocol) stdio server—with both sharing the same underlying core pipeline.\n\n## Overview\n\nThe CLI accepts a URL as its primary argument and outputs the converted markdown to stdout or to a specified file. It was introduced in version 0.5.0 as a way to make markfetch accessible from standard shell environments, pipelines, and scripts.\n\n| Aspect | Details |\n|--------|---------|\n| Entry Point | `markfetch <url>` |\n| Output | stdout (default) or file via `-o` |\n| Version | 0.6.0 |\n| Runtime | Node.js ≥ 24 |\n| Distribution | npm package |\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Architecture\n\nThe CLI is implemented as an adapter layer that delegates to the shared core. When the process is invoked with arguments, the dispatcher in `index.ts` lazy-loads the CLI adapter; bare invocation (zero arguments) routes to the MCP server instead.\n\n```mermaid\ngraph TD\n    A[\"markfetch CLI Invokation<br/>process.argv.length > 1\"] --> B[\"src/index.ts<br/>Dispatcher\"]\n    B --> C[\"src/cli.ts<br/>CLI Adapter\"]\n    C --> D[\"src/core.ts<br/>fetchMarkdown()\"]\n    D --> E[\"src/sandbox.ts<br/>Write Validation\"]\n    D --> F[\"HTTP Fetch + Readability + Turndown\"]\n    \n    G[\"Bare Invocation<br/>process.argv.length === 1\"] --> H[\"src/mcp.ts<br/>MCP Server\"]\n```\n\n资料来源：[src/cli.ts:39-47](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Command Syntax\n\n```bash\nmarkfetch <url> [options]\n```\n\n### Arguments\n\n| Argument | Required | Description |\n|----------|----------|-------------|\n| `<url>` | Yes | Absolute http(s) URL to fetch |\n\n### Options\n\n| Flag | Description |\n|------|-------------|\n| `-o, --output <path>` | Save markdown to a file (absolute or relative path). Default is stdout. |\n| `-V, --version` | Print version and exit |\n| `-h, --help` | Print usage and exit |\n\n资料来源：[src/cli.ts:23-30](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Output Behavior\n\nThe CLI maintains strict separation between its output channels:\n\n| Scenario | Channel | Content |\n|----------|---------|---------|\n| Raw markdown (no `-o`) | stdout | Raw markdown body via `process.stdout.write()` |\n| File output (`-o`) | stdout | Confirmation: `Saved N bytes to <path>` |\n| Any error | stderr | `[code] message` |\n\nThe raw markdown is written using `process.stdout.write()` rather than `console.log()` to preserve trailing whitespace in the output—matching the exact bytes the MCP adapter would emit in `content[0].text`.\n\n资料来源：[src/cli.ts:50-58](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Error Handling\n\nErrors are written to stderr with a deterministic format: `[code] message`. The process exits with a non-zero status code.\n\n```typescript\nprocess.exitCode = 1;\nconsole.error(`[${code}] ${message}`);\n```\n\nThe CLI uses `process.exitCode` (not `process.exit()`) to ensure pending output drains before the process exits—important when stdout is piped to a slow consumer.\n\n资料来源：[src/cli.ts:58-62](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n### Error Codes\n\n| Code | Meaning |\n|------|---------|\n| `network_error` | DNS / TCP / TLS failure |\n| `http_error` | Upstream returned a non-2xx status |\n| `timeout` | Request exceeded `MARKFETCH_TIMEOUT_MS` |\n| `unsupported_content_type` | Response was not HTML |\n| `extraction_failed` | No extractable article content |\n| `too_large` | Response or markdown exceeded `MARKFETCH_MAX_BYTES` |\n| `save_failed` | File write failed (permission denied, etc.) |\n\nNote: `save_forbidden` is MCP-only and does not apply to CLI (no sandbox).\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Path Resolution\n\nThe CLI resolves relative output paths against the current working directory before passing them to the core:\n\n```typescript\nconst savePath = options.output\n  ? resolve(process.cwd(), options.output)\n  : undefined;\n```\n\nTilde expansion is intentionally **not** performed—the shell expands `~/foo` before argv reaches the process, and a quoted literal `'~/foo'` should produce a file named `~/foo` in cwd (standard tool behavior).\n\n资料来源：[src/cli.ts:32-39](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Environment Variables\n\nThese environment variables apply to both CLI and MCP modes:\n\n| Variable | Default | Purpose |\n|----------|---------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in ms |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Override User-Agent header |\n\nThe CLI adapter imports `fetchMarkdown` and `classifyError` from the core module, which validates these environment variables at startup.\n\n资料来源：[src/cli.ts:15](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts) and [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## File Structure\n\nThe project source is organized into adapter modules:\n\n```\nsrc/\n├── index.ts    # Dispatcher (lazy-loads cli.ts or mcp.ts)\n├── core.ts     # Shared pipeline and errors\n├── cli.ts      # CLI adapter (commander-based)\n└── mcp.ts      # MCP stdio server adapter\n```\n\nThe lazy-import pattern ensures that `cli.ts` code (which calls `console.log`) is never loaded when running in MCP mode, preserving the \"stdout is reserved for MCP frames\" invariant structurally.\n\n资料来源：[CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md) and [src/cli.ts:1-13](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Installation\n\nInstall globally via npm:\n\n```bash\nnpm i -g markfetch\n```\n\nOr use via npx without installation:\n\n```bash\nnpx -y markfetch <url>\n```\n\nThe `bin` entry in `package.json` points to `dist/index.js`:\n\n```json\n{\n  \"bin\": {\n    \"markfetch\": \"dist/index.js\"\n  }\n}\n```\n\n资料来源：[package.json:16-18](https://github.com/vasylenko/markfetch/blob/main/package.json)\n\n## Usage Examples\n\n### Basic fetch to stdout\n\n```bash\nmarkfetch https://en.wikipedia.org/wiki/Markdown\n```\n\n### Save to file\n\n```bash\nmarkfetch https://example.com/article -o output.md\n```\n\n### With timeout override\n\n```bash\nMARKFETCH_TIMEOUT_MS=60000 markfetch https://slow-site.example.com\n```\n\n### Pipeline to another tool\n\n```bash\nmarkfetch https://example.com/doc | grep -A5 \"## Installation\"\n```\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n---\n\n<a id='mcp-server'></a>\n\n## MCP Server Integration\n\n### 相关页面\n\n相关主题：[Quick Start Guide](#quickstart), [CLI Usage](#cli-usage), [Write Sandbox Security](#write-sandbox)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/index.ts](https://github.com/vasylenko/markfetch/blob/main/src/index.ts)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n</details>\n\n# MCP Server Integration\n\n## Overview\n\nThe MCP (Model Context Protocol) Server Integration is the primary interface for AI agents to fetch web content as clean markdown. Markfetch exposes a single MCP tool `fetch_markdown` that accepts a URL and returns extracted markdown content, enabling language models like Claude to access web information through a standardized protocol.\n\nThe MCP server operates as a stdio-based server, meaning it communicates exclusively through standard input and standard output streams. This design ensures the server integrates seamlessly with MCP clients including Claude Desktop, Claude Code, Cursor, and Goose.\n\n## Architecture\n\n### Entry Point Dispatcher\n\nThe `src/index.ts` file implements an argv-discriminated dispatcher that determines whether to start the MCP server or the CLI based on the presence of command-line arguments:\n\n```typescript\nif (process.argv.length === 2) {\n  await import(\"./mcp.js\");\n} else {\n  await import(\"./cli.js\");\n}\n```\n\n**资料来源：[src/index.ts:26-29]()**\n\nWhen `process.argv.length === 2`, the process was invoked without arguments—this is the standard pattern MCP clients use when spawning a server. Any extra argument (URL, flags, `--help`) routes to the CLI adapter.\n\n### Module Isolation\n\nThe dynamic import pattern ensures complete module isolation:\n\n```mermaid\ngraph TD\n    A[markfetch entry] --> B{argv.length === 2?}\n    B -->|Yes| C[Lazy import: mcp.ts]\n    B -->|No| D[Lazy import: cli.ts]\n    C --> E[@modelcontextprotocol/sdk loaded]\n    D --> F[commander loaded]\n    E -.-> G[Never reaches console.log]\n    F -.-> H[Can use console.log]\n```\n\n**资料来源：[src/index.ts:18-22]()**\n\nThis architecture enforces the \"stdout is reserved for MCP frames\" invariant structurally—the MCP path never imports `cli.ts`, so code that calls `console.log` is literally unreachable from the MCP execution path.\n\n## MCP Server Implementation\n\n### Server Initialization\n\nThe MCP server is initialized using the `@modelcontextprotocol/sdk` package:\n\n```typescript\nconst server = new McpServer({ name: \"markfetch\", version: \"0.6.0\" });\n```\n\n**资料来源：[src/mcp.ts:20]()**\n\n### Tool Registration\n\nThe server registers a single tool `fetch_markdown` with a Zod-based input schema:\n\n```typescript\nserver.registerTool(\n  \"fetch_markdown\",\n  {\n    description: \"Fetch a single public HTTP/S URL and return its main article content as clean markdown...\",\n    inputSchema: {\n      url: z.string().url().describe(\"Absolute http(s) URL of the page to fetch...\"),\n      savePath: z.string().refine(isAbsolute, \"savePath must be an absolute filesystem path\").optional().describe(\"Optional. When provided...\")\n    }\n  },\n  async ({ url, savePath }) => {\n    // Implementation\n  }\n);\n```\n\n**资料来源：[src/mcp.ts:22-47]()**\n\n### Tool Input Schema\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `url` | string | Yes | Absolute http(s) URL of the page to fetch. The server follows redirects automatically. No authentication headers, cookies, or session state are sent. |\n| `savePath` | string | No | Optional absolute filesystem path. When provided, the fetched markdown is written to this path instead of returned inline. |\n\nThe `url` parameter is validated using Zod's `.url()` method to ensure a valid URL format. The `savePath` parameter must be an absolute path, enforced by the `.refine(isAbsolute, ...)` check.\n\n### Response Format\n\nThe tool returns a response in this structure:\n\n```typescript\n{\n  content: [{ type: \"text\", text: \"markdown content or [errorcode] message\" }],\n  isError: boolean\n}\n```\n\n**资料来源：[src/mcp.ts:8-12]()**\n\n## Error Handling\n\n### Error Code System\n\nThe MCP adapter uses a uniform error code system with 8 deterministic codes:\n\n| Error Code | Description | Source |\n|------------|-------------|--------|\n| `network_error` | DNS/TCP/TLS failure or unexpected internal error | core.ts |\n| `http_error` | Upstream returned non-2xx status | core.ts |\n| `timeout` | Per-request budget exceeded | core.ts |\n| `unsupported_content_type` | Response was not text/html or application/xhtml+xml | core.ts |\n| `extraction_failed` | Readability returned no article content | core.ts |\n| `too_large` | Response or markdown exceeded MARKFETCH_MAX_BYTES | core.ts |\n| `save_failed` | writeFile failed (permission denied, missing directory) | core.ts |\n| `save_forbidden` | savePath resolves outside allowed write roots | src/mcp.ts |\n\n### Error Result Factory\n\n```typescript\nfunction errorResult(code: ErrorCode, message: string) {\n  return {\n    content: [{ type: \"text\" as const, text: `[${code}] ${message}` }],\n    isError: true,\n  };\n}\n```\n\n**资料来源：[src/mcp.ts:8-12]()**\n\n### Error Propagation Pattern\n\nIn version 0.5.0, error handling was refactored so that core functions now `throw MarkfetchError` instead of returning error results inline. Both the MCP and CLI adapters catch these exceptions and convert them to their respective output formats.\n\n**资料来源：[CHANGELOG.md:19-21]()**\n\n## Write Sandbox (MCP-Specific)\n\nThe MCP server implements a write sandbox that restricts `savePath` operations to a set of allowed root directories.\n\n### Default Allowed Roots\n\nBy default, the allowed set is:\n- `os.tmpdir()` (system temp directory)\n- `process.cwd()` (current working directory)\n\nEach path is resolved via `fs.realpath` at startup to handle symlinks.\n\n### Configuration\n\nThe `MARKFETCH_ALLOWED_WRITE_ROOTS` environment variable overrides the default set entirely:\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"/Users/me/markfetch-out:/tmp\"\n      }\n    }\n  }\n}\n```\n\n**资料来源：[README.md:89-100]()**\n\n### Security Rationale\n\nThe sandbox is MCP-only by design. The CLI is unrestricted because \"a human at the shell is the security boundary.\" The asymmetry exists because the MCP tool is driven by a language model, which may be steered by content from a page it just fetched.\n\n**资料来源：[README.md:102-104]()**\n\n## Request Flow\n\n```mermaid\nsequenceDiagram\n    participant Client as MCP Client\n    participant MCP as MCP Server\n    participant Core as fetchMarkdown()\n    participant Fetch as HTTP Fetcher\n\n    Client->>MCP: fetch_markdown({url, savePath?})\n    MCP->>Core: fetchMarkdown({url, savePath})\n    Core->>Fetch: GET url (with Chrome fingerprint)\n    Fetch-->>Core: HTML response\n    Core->>Core: Readability parsing\n    Core->>Core: Turndown conversion\n    alt savePath provided\n        Core->>Core: Write to file (within sandbox)\n    end\n    Core-->>MCP: {markdown, bytes, savedTo?}\n    MCP-->>Client: {content: [{text: markdown}], isError: false}\n```\n\n## Environment Configuration\n\n| Variable | Default | Purpose | MCP-Specific |\n|----------|---------|---------|--------------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in ms | No |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown | No |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Override the User-Agent header | No |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | `os.tmpdir()` + `process.cwd()` | Permitted write roots for savePath | **Yes** |\n\n**资料来源：[src/mcp.ts:1-5](), [README.md:68-75]()**\n\n## Integration with Clients\n\n### Claude Desktop / Claude Code\n\n```bash\nclaude mcp add --scope user markfetch -- npx -y markfetch\n```\n\n**资料来源：[README.md:40-43]()**\n\n### Codex\n\n```bash\ncodex mcp add markfetch -- npx -y markfetch\n```\n\n**资料来源：[README.md:46-48]()**\n\n### Manual Configuration\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"]\n    }\n  }\n}\n```\n\n**资料来源：[README.md:52-58]()**\n\n## Dependencies\n\nThe MCP server depends on:\n\n| Package | Version | Purpose |\n|---------|---------|---------|\n| `@modelcontextprotocol/sdk` | ^1.29.0 | MCP protocol implementation |\n| `zod` | ^3.0.0 | Input schema validation |\n| `@mozilla/readability` | ^0.5.0 | Article extraction |\n| `turndown` | ^7.0.0 | HTML to Markdown conversion |\n| `undici` | ^8.2.0 | HTTP client |\n| `linkedom` | ^0.18.0 | DOM parsing |\n\n**资料来源：[package.json:36-47]()**\n\n---\n\n<a id='environment-variables'></a>\n\n## Environment Variables\n\n### 相关页面\n\n相关主题：[HTTP/2 Fingerprinting](#http-fingerprinting), [Write Sandbox Security](#write-sandbox), [Error Handling](#error-handling)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n</details>\n\n# Environment Variables\n\nmarkfetch uses environment variables to configure runtime behavior at startup. These variables control network timeouts, response size limits, HTTP fingerprinting, and file write permissions for the MCP server.\n\n## Overview\n\nEnvironment variables in markfetch serve as the primary configuration mechanism. Unlike per-request options, these settings apply globally to every operation and are validated once at process startup. This fail-fast design prevents misconfiguration from producing confusing per-request errors later.\n\n```mermaid\ngraph TD\n    A[Process Start] --> B[Validate MARKFETCH_TIMEOUT_MS]\n    A --> C[Validate MARKFETCH_MAX_BYTES]\n    A --> D[Validate MARKFETCH_USER_AGENT]\n    A --> E[Build MARKFETCH_ALLOWED_WRITE_ROOTS]\n    B --> F{Valid?}\n    C --> F\n    D --> F\n    E --> F\n    F -->|Yes| G[Server Ready]\n    F -->|No| H[Exit with stderr error]\n```\n\nAll validation occurs before the server begins accepting requests. Invalid values cause immediate process termination with a descriptive error message written to stderr.\n\n## Configuration Variables\n\n### MARKFETCH_TIMEOUT_MS\n\n| Property | Value |\n|----------|-------|\n| Default | `30000` (30 seconds) |\n| Purpose | Per-request timeout in milliseconds |\n| Type | Positive integer |\n\nControls the maximum duration allowed for any single HTTP request, including DNS resolution, TCP connection, TLS handshake, and response body transfer.\n\n```typescript\nconst config = {\n  timeoutMs: intEnv(\"MARKFETCH_TIMEOUT_MS\", 30_000),\n};\n```\n\nValidation rejects non-positive integers, non-integer values, and non-finite numbers (NaN, Infinity). A malformed value produces:\n\n```\n[core] Error: Invalid MARKFETCH_TIMEOUT_MS=\"abc\" — expected a positive integer.\n```\n\n资料来源：[src/core.ts:1-50]()\n\n### MARKFETCH_MAX_BYTES\n\n| Property | Value |\n|----------|-------|\n| Default | `5000000` (~4.77 MB) |\n| Purpose | Cap on response body and extracted markdown |\n| Type | Positive integer |\n\nBoth the raw HTTP response body and the final extracted markdown are checked against this limit. If either exceeds the cap, the operation returns `too_large` error.\n\n```typescript\nconst config = {\n  maxBytes: intEnv(\"MARKFETCH_MAX_BYTES\", 5_000_000),\n};\n```\n\n资料来源：[src/core.ts:1-50]()\n\n### MARKFETCH_USER_AGENT\n\n| Property | Value |\n|----------|-------|\n| Default | `Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36` |\n| Purpose | HTTP User-Agent header and Sec-CH-UA-* client hints |\n| Type | String (must contain \"Chrome\") |\n\nThe User-Agent string determines both the HTTP header sent to servers and the derived Sec-CH-UA-* client hints. The hints are derived at startup and remain fixed for the process lifetime.\n\n```mermaid\ngraph LR\n    A[MARKFETCH_USER_AGENT] --> B[deriveClientHints]\n    B --> C[Sec-CH-UA]\n    B --> D[Sec-CH-UA-Mobile]\n    B --> E[Sec-CH-UA-Platform]\n    A --> F[User-Agent Header]\n```\n\n```typescript\nfunction deriveClientHints(ua: string): {\n  brands: string;\n  mobile: string;\n  platform: string;\n} {\n  const versionMatch = /\\bChrome\\/(\\d+)/.exec(ua);\n  if (!versionMatch) {\n    throw new Error(\n      `Invalid MARKFETCH_USER_AGENT=${JSON.stringify(ua)} — expected a Chrome User-Agent containing \"Chrome/...\"`\n    );\n  }\n  // ...\n}\n```\n\nThe UA must contain a Chrome version string. Non-Chrome UAs fail fast at startup to prevent fingerprinting mismatches that would increase bot detection.\n\n资料来源：[src/core.ts:1-50]()\n\n## Write Sandbox (MCP-Only)\n\n### MARKFETCH_ALLOWED_WRITE_ROOTS\n\n| Property | Value |\n|----------|-------|\n| Default | `os.tmpdir() ∪ process.cwd()` |\n| Purpose | Restrict MCP `savePath` writes to specific directories |\n| Type | Platform-delimiter-separated absolute paths |\n| Platform | POSIX: `:` delimiter; Windows: `;` delimiter |\n| Mode | MCP-only (CLI has no sandbox) |\n\nThis variable applies exclusively to the MCP server mode. The CLI operates without restriction, treating the human at the shell as the security boundary.\n\n```mermaid\ngraph TD\n    A[MCP savePath request] --> B{Path inside allowed roots?}\n    B -->|Yes| C[Write file]\n    B -->|No| D[Return save_forbidden error]\n    C --> E[Confirmation to client]\n    D --> F[No file created]\n```\n\nWhen set, the value **replaces** the defaults entirely rather than merging with them. To retain access to the default directories, include them explicitly:\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"/Users/me/markfetch-out:/tmp\"\n      }\n    }\n  }\n}\n```\n\nOn Windows:\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"C:\\\\Users\\\\me\\\\markfetch-out;C:\\\\Users\\\\me\\\\AppData\\\\Local\\\\Temp\"\n      }\n    }\n  }\n}\n```\n\n### Validation Rules\n\nEach entry in the list must be:\n\n1. An absolute path (relative paths fail fast)\n2. An existing directory at startup\n3. Resolved through symlinks for containment checks\n\n```typescript\nfunction buildAllowedRoots(envValue?: string): string[] {\n  // ...\n}\n```\n\nSymlinks pointing outside the sandbox are blocked. The canonicalized path flows from the containment check into `writeFile`, ensuring the file is created exactly at the validated location.\n\n资料来源：[src/sandbox.ts:1-50]()\n资料来源：[src/mcp.ts:1-50]()\n\n## Error Codes\n\nWhen environment variable validation fails, markfetch writes to stderr and exits with a non-zero status:\n\n| Error Code | Trigger | Exit Status |\n|------------|---------|-------------|\n| Startup failure | Invalid MARKFETCH_TIMEOUT_MS | Non-zero |\n| Startup failure | Invalid MARKFETCH_MAX_BYTES | Non-zero |\n| Startup failure | Non-Chrome MARKFETFETCH_USER_AGENT | Non-zero |\n| Startup failure | Malformed MARKFETCH_ALLOWED_WRITE_ROOTS | Non-zero |\n| Runtime error | `save_forbidden` (MCP only) | Non-zero |\n\nRuntime errors from invalid environment values (e.g., `MARKFETCH_TIMEOUT_MS=\"abc\"`) differ from request-scoped errors like `http_error` or `timeout`. Environment misconfiguration is always fatal at startup.\n\n## Environment Variable Summary\n\n| Variable | Default | Scope | Purpose |\n|----------|---------|-------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Both | Request timeout in ms |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Both | Response and markdown size cap |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Both | HTTP fingerprint |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | tmpdir + cwd | MCP only | Write sandbox boundaries |\n\n## Configuration Priority\n\nEnvironment variables set at process startup take precedence over all other configuration. There is no runtime override mechanism—changing these values requires restarting the server.\n\n```mermaid\ngraph TD\n    A[Environment Variable] --> B[Validated at Startup]\n    B --> C[Stored in config object]\n    C --> D[Used by core.ts pipeline]\n    D --> E[HTTP Request]\n    D --> F[File Write]\n    D --> G[Response Validation]\n```\n\n## Security Considerations\n\nThe write sandbox exists because the MCP tool is driven by a language model, which may be steered by content from a page it just fetched. Without sandboxing, a malicious page could诱导 the model to request writes outside expected directories.\n\nThe CLI intentionally has no sandbox—direct human invocation at the shell establishes the trust boundary.\n\n资料来源：[README.md:1-100]()\n</details>\n\n---\n\n<a id='write-sandbox'></a>\n\n## Write Sandbox Security\n\n### 相关页面\n\n相关主题：[MCP Server Integration](#mcp-server), [Environment Variables](#environment-variables), [Error Handling](#error-handling)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n</details>\n\n# Write Sandbox Security\n\n## Overview\n\nThe Write Sandbox is a security mechanism in markfetch that restricts filesystem writes initiated via the MCP (Model Context Protocol) interface to a configurable set of allowed root directories. This protection prevents a language model, which may be influenced by fetched content, from writing files to arbitrary locations on the host system.\n\nThe sandbox enforces path containment by resolving symlinks and comparing canonicalized paths against the configured allowed roots. Any attempted write outside the sandbox boundary returns a `save_forbidden` error and the file is never created.\n\n## Purpose and Scope\n\n### Security Boundary\n\nThe sandbox exists because MCP tools are driven by a language model that can be steered by content from pages it fetches. Without containment:\n\n- A malicious or compromised webpage could instruct the LLM to write files to sensitive locations (e.g., `~/.ssh/authorized_keys`, `~/.bashrc`)\n- Path traversal attempts via symlinks could escape expected boundaries\n- Untrusted fetched content could modify configuration files or inject malicious code\n\nThe CLI mode intentionally has **no sandbox**. A human at the shell is considered the security boundary, as the user has direct control over command invocation and can review output before it reaches any model.\n\n### Scope Limitations\n\n| Scope | Sandboxed? |\n|-------|------------|\n| MCP server (`fetch_markdown` tool) | Yes |\n| CLI mode (`markfetch <url>`) | No |\n| Direct `node` execution | No |\n\n资料来源：[README.md:68-70](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Configuration\n\n### Environment Variable\n\n| Variable | Type | Default | Description |\n|----------|------|---------|-------------|\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | String | `os.tmpdir()` + `process.cwd()` | Path-delimiter-separated list of absolute paths permitted as MCP `savePath` write roots |\n\n### Path Delimiters\n\nThe delimiter varies by platform:\n\n| Platform | Delimiter | Example |\n|----------|-----------|---------|\n| POSIX (Linux, macOS) | `:` | `/tmp:/home/user/markfetch-out` |\n| Windows | `;` | `C:\\Users\\me\\markfetch-out;C:\\Temp` |\n\n### Behavior Rules\n\n1. **Replacement, not merge**: When set, the variable replaces the defaults entirely. To retain access to `os.tmpdir()` or `process.cwd()`, explicitly include them.\n\n2. **Validation at startup**: Malformed values (non-absolute entries, nonexistent directories) cause the server to fail fast on stderr.\n\n3. **Realpath resolution**: Each root is resolved once via `fs.realpath` at startup to canonicalize symlinks.\n\n资料来源：[README.md:71-89](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### Configuration Example\n\n**POSIX:**\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"/Users/me/markfetch-out:/tmp\"\n      }\n    }\n  }\n}\n```\n\n**Windows:**\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"C:\\\\Users\\\\me\\\\markfetch-out;C:\\\\Users\\\\me\\\\AppData\\\\Local\\\\Temp\"\n      }\n    }\n  }\n}\n```\n\n## Security Model\n\n### Path Resolution Flow\n\n```mermaid\ngraph TD\n    A[User provides savePath] --> B{Is path absolute?}\n    B -->|No| E[Error: savePath must be absolute]\n    B -->|Yes| C[Resolve via fs.realpath]\n    C --> D{Is resolved path inside allowed roots?}\n    D -->|Yes| F[Allow write to resolved path]\n    D -->|No| G[Return save_forbidden error]\n    \n    H[Allowed roots from env] --> I[Realpath-resolved at startup]\n    I --> D\n```\n\n### Symlink Handling\n\nThe sandbox protects against symlink-based escapes:\n\n1. **Resolve before check**: Symlinks are resolved via `fs.realpath` before containment validation\n2. **Re-resolve at write time**: The canonicalized path from the validation check flows directly into `writeFile`\n3. **No lexical comparison**: A path like `<sandbox>/link/..` is not compared lexically against the roots—it's resolved first, then validated\n\nThis prevents attacks where a symlink planted inside the sandbox points outside, collapsing lexically for the check but resolving to an external location at write time.\n\n资料来源：[CHANGELOG.md:17-25](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n\n### Platform-Specific Behaviors\n\n| Platform | Case Sensitivity | Notes |\n|----------|------------------|-------|\n| Linux/macOS | Case-sensitive | Paths must match exactly |\n| Windows | Case-insensitive | `C:\\Users\\Bob` and `c:\\users\\bob` are equivalent |\n\nOn Windows, the containment check lowercases both the root and target paths before comparison.\n\n资料来源：[src/sandbox.ts:28-30](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n\n## Core Implementation\n\n### API Design\n\nThe sandbox module exposes two primary functions:\n\n```typescript\nfunction buildAllowedRoots(env: Record<string, string | undefined>): string[]\nfunction validateSavePath(\n  savePath: string,\n  roots: string[]\n): { ok: boolean; resolved?: string; reason?: string }\n```\n\n### `buildAllowedRoots()`\n\nParses `MARKFETCH_ALLOWED_WRITE_ROOTS` from environment variables:\n\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| `env` | `Record<string, string \\| undefined>` | Process environment variables |\n\n| Return Type | Description |\n|-------------|-------------|\n| `string[]` | Array of absolute, realpath-resolved directory paths |\n\n**Logic:**\n1. If `MARKFETCH_ALLOWED_WRITE_ROOTS` is unset: return `[os.tmpdir(), process.cwd()]`\n2. If set: split by platform delimiter, validate each is absolute and exists\n3. Resolve each via `fs.realpath` for canonical form\n\n### `validateSavePath()`\n\nValidates a save path is within allowed roots:\n\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| `savePath` | `string` | The requested save path |\n| `roots` | `string[]` | Allowed root directories |\n\n| Return Type | Description |\n|-------------|-------------|\n| `{ ok: true, resolved: string }` | Path is allowed; `resolved` is the canonicalized path for writing |\n| `{ ok: false, reason: string }` | Path is outside sandbox; `reason` describes the violation |\n\n**Validation steps:**\n1. Resolve `savePath` via `fs.realpath`\n2. For each root, compute relative path from root to resolved target\n3. If relative path is empty (same directory) or does not start with `..` and is not absolute: allow\n4. Otherwise: reject with reason listing allowed roots\n\n资料来源：[src/sandbox.ts:1-50](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n\n## Error Handling\n\n### Error Codes\n\n| Code | Condition | Response |\n|------|-----------|----------|\n| `save_forbidden` | `savePath` resolves outside allowed roots | No file written; MCP returns error |\n| `save_failed` | `savePath` is valid but `writeFile` fails | No file written; MCP returns error |\n\n### Error Message Format\n\nAll sandbox errors return the format:\n```\n[save_forbidden] '<path>' is outside the allowed write roots: ['/allowed/root1', '/allowed/root2']\n```\n\nThis provides:\n- The attempted path\n- The reason for rejection\n- The list of allowed roots for debugging\n\n资料来源：[src/mcp.ts:8-13](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n## MCP Integration\n\n### Tool Schema\n\n```typescript\nserver.registerTool(\"fetch_markdown\", {\n  inputSchema: {\n    url: z.string().url().describe(\"...\"),\n    savePath: z.string()\n      .refine(isAbsolute, \"savePath must be an absolute filesystem path\")\n      .optional()\n      .describe(\"Optional. When provided, the fetched markdown is written to this absolute filesystem path...\")\n  }\n});\n```\n\n### Validation Flow\n\n1. MCP adapter receives `savePath` parameter\n2. Validates path is absolute (via Zod schema)\n3. Calls `validateSavePath(savePath, allowedRoots)`\n4. If `ok: false`: throw `MarkfetchError` with `save_forbidden` code\n5. If `ok: true`: use `resolved` path for `writeFile`\n\n资料来源：[src/mcp.ts:24-35](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n## Architecture Diagram\n\n```mermaid\ngraph LR\n    subgraph MCP_Client\n        A[LLM sends fetch_markdown with savePath]\n    end\n    \n    subgraph MCP_Server\n        B[src/mcp.ts - MCP adapter]\n        C[src/core.ts - fetchMarkdown]\n        D[src/sandbox.ts - validateSavePath]\n    end\n    \n    subgraph File_System\n        E[fs.realpath resolution]\n        F[fs.writeFile]\n    end\n    \n    A --> B\n    B -->|validate path| D\n    D -->|resolve symlink| E\n    E -->|check containment| D\n    D -->|ok: true| C\n    C -->|write markdown| F\n    \n    D -->|ok: false| B\n    B -->|save_forbidden| A\n```\n\n## CLI vs MCP Behavior\n\n| Aspect | CLI Mode | MCP Mode |\n|--------|----------|----------|\n| Write sandbox | None | Enforced |\n| Path validation | Not performed | Required |\n| Symlink resolution | Not performed | Required |\n| `savePath` parameter | Optional, `-o` flag | Optional, tool parameter |\n| Relative path resolution | Resolves against cwd | Not allowed (must be absolute) |\n\nThe CLI adapter resolves relative paths internally for convenience, but the MCP adapter requires absolute paths and enforces the sandbox.\n\n资料来源：[src/cli.ts:6-18](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Security Considerations\n\n### Attack Vectors Mitigated\n\n1. **Path traversal**: `../../etc/passwd` is resolved before checking\n2. **Symlink escape**: `<sandbox>/link_to_external` is resolved and rejected\n3. **Case confusion (Windows)**: `C:\\Users\\Bob` equals `c:\\users\\bob`\n4. **Tilde expansion**: Not performed; shell expands `~` before argv reaches process\n\n### Remaining Trust Boundaries\n\n| Trust Level | Description |\n|-------------|-------------|\n| Filesystem permissions | Sandbox does not override OS file permissions |\n| Network | Does not prevent network-based attacks |\n| Content injection | Does not sanitize markdown content before writing |\n\n## Related Files\n\n| File | Role |\n|------|------|\n| `src/sandbox.ts` | Core sandbox validation logic |\n| `src/mcp.ts` | MCP server adapter, uses sandbox |\n| `src/cli.ts` | CLI adapter, no sandbox |\n| `src/core.ts` | Core fetch pipeline |\n| `README.md` | User documentation and configuration |\n| `CHANGELOG.md` | Historical security fix for symlink escape |\n\n## Changelog\n\n| Version | Change |\n|---------|--------|\n| 0.6.0 | Current release with full sandbox implementation |\n| 0.5.0 | CLI mode added (unrestricted by design) |\n| < 0.5.0 | MCP-only, sandbox introduced |\n\n资料来源：[package.json:3](https://github.com/vasylenko/markfetch/blob/main/package.json)\n\n---\n\n<a id='error-handling'></a>\n\n## Error Handling\n\n### 相关页面\n\n相关主题：[Processing Pipeline](#processing-pipeline), [Write Sandbox Security](#write-sandbox), [Environment Variables](#environment-variables)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n</details>\n\n# Error Handling\n\nmarkfetch implements a deterministic, structured error handling system that provides consistent error reporting across both CLI and MCP interfaces. All errors are categorized into specific codes that enable precise failure diagnosis and appropriate recovery strategies.\n\n## Error Code Reference\n\nmarkfetch defines eight deterministic error codes that cover all failure scenarios. Each code is designed to be actionable, helping callers understand exactly what went wrong and how to respond.\n\n| Error Code | Meaning | Typical Cause |\n|---|---|---|\n| `network_error` | DNS, TCP, or TLS failure | Firewall blocking, network unavailable, invalid hostname |\n| `http_error` | Non-2xx HTTP response | 404 page not found, 403 forbidden, 500 server error |\n| `timeout` | Request exceeded `MARKFETCH_TIMEOUT_MS` | Slow server, large page, network latency |\n| `unsupported_content_type` | Response is not HTML | Binary files, JSON APIs, PDF documents |\n| `extraction_failed` | Readability found no article content | Pure client-rendered SPAs with no static HTML |\n| `too_large` | Body or markdown exceeded `MARKFETCH_MAX_BYTES` | Very large articles with embedded media |\n| `save_failed` | File write operation failed | Missing parent directory, permission denied |\n| `save_forbidden` | Save path outside allowed write roots | Path traverses symlink outside sandbox |\n\n资料来源：[README.md](README.md)\n\n## Error Architecture\n\nThe error handling system follows a layered architecture where core validation and error creation happen in `src/core.ts`, while each adapter (CLI and MCP) provides interface-specific error formatting and reporting.\n\n```mermaid\ngraph TD\n    A[Request] --> B[core.ts Validation]\n    B --> C{Error Condition?}\n    C -->|No| D[Successful Fetch]\n    C -->|Yes| E[MarkfetchError Thrown]\n    E --> F[Adapter Layer]\n    F --> G[CLI Adapter]\n    F --> H[MCP Adapter]\n    G --> I[stderr: [code] message]\n    H --> J[content[0].text: [code] message]\n    J --> K[isError: true]\n```\n\n资料来源：[src/core.ts](src/core.ts), [src/cli.ts](src/cli.ts), [src/mcp.ts](src/mcp.ts)\n\n## MarkfetchError Class\n\nThe central error type is `MarkfetchError`, which encapsulates both the error code and human-readable message. This class serves as the single error type thrown throughout the application.\n\n```typescript\nclass MarkfetchError {\n  constructor(\n    public readonly code: ErrorCode,\n    public readonly message: string\n  ) {}\n}\n```\n\n资料来源：[src/core.ts:1-100](src/core.ts)\n\n## Environment Variable Validation\n\nmarkfetch validates configuration environment variables at startup to fail fast on misconfiguration rather than producing confusing per-request errors.\n\n| Variable | Default | Validation Rules |\n|---|---|---|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Positive integer |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Positive integer |\n| `MARKFETCH_USER_AGENT` | Chrome 130 UA string | Must contain Chrome substring |\n\nThe `intEnv` function performs validation:\n\n```typescript\nfunction intEnv(name: string, fallback: number): number {\n  const raw = process.env[name];\n  if (raw == null || raw === \"\") return fallback;\n  const n = Number(raw);\n  if (!Number.isFinite(n) || !Number.isInteger(n) || n <= 0) {\n    throw new Error(\n      `Invalid ${name}=${JSON.stringify(raw)} — expected a positive integer.`,\n    );\n  }\n  return n;\n}\n```\n\n资料来源：[src/core.ts:1-100](src/core.ts)\n\n### User-Agent Validation\n\nThe `MARKFETFET_USER_AGENT` must be a valid Chrome User-Agent string. This requirement exists because Sec-CH-UA-* client hints are derived from the User-Agent at startup, and a mismatch creates a stronger bot signal.\n\n```typescript\nfunction deriveClientHints(ua: string): {\n  brands: string;\n  mobile: string;\n  platform: string;\n} {\n  const versionMatch = /\\bChrome\\/(\\d+)/.exec(ua);\n  if (!versionMatch) {\n    throw new Error(\n      `Invalid MARKFETCH_USER_AGENT=${JSON.stringify(ua)} — expected a Chrome User-Agent containing \"Chrome/VERSION\".`,\n    );\n  }\n  // ...\n}\n```\n\n资料来源：[src/core.ts:1-100](src/core.ts)\n\n## CLI Error Handling\n\nThe CLI adapter catches errors thrown from core and formats them for stderr output. Error output follows a consistent `[code] message` format that matches the MCP error format exactly.\n\n```typescript\ntry {\n  const { markdown, bytes, savedTo } = await fetchMarkdown({\n    url,\n    savePath,\n  });\n  // ... success handling\n} catch (err) {\n  const { code, message } = classifyError(err);\n  console.error(`[${code}] ${message}`);\n  // Use exitCode so pending output drains before process exits\n  process.exitCode = 1;\n}\n```\n\n资料来源：[src/cli.ts:1-50](src/cli.ts)\n\n### CLI Exit Codes\n\n| Scenario | Exit Code | Output |\n|---|---|---|\n| Success (stdout) | 0 | Raw markdown |\n| Success (save to file) | 0 | `Saved X bytes to /path` |\n| Any error | 1 | `[code] message` to stderr |\n\nThe use of `process.exitCode = 1` (rather than `process.exit(1)`) ensures pending stdout/stderr output drains before the process terminates, which is important when stdout is piped to a slow consumer.\n\n资料来源：[src/cli.ts:1-50](src/cli.ts)\n\n## MCP Error Handling\n\nThe MCP adapter returns errors in a format compatible with the MCP protocol. Errors appear in the `content[0].text` field with `isError: true` set.\n\n```typescript\nfunction errorResult(code: ErrorCode, message: string) {\n  return {\n    content: [{ type: \"text\" as const, text: `[${code}] ${message}` }],\n    isError: true,\n  };\n}\n```\n\n资料来源：[src/mcp.ts:1-50](src/mcp.ts)\n\n### MCP Response Structure for Errors\n\n```json\n{\n  \"content\": [\n    {\n      \"type\": \"text\",\n      \"text\": \"[network_error] DNS lookup failed\"\n    }\n  ],\n  \"isError\": true\n}\n```\n\n资料来源：[src/mcp.ts:1-50](src/mcp.ts)\n\n## Write Sandbox Errors\n\nThe MCP interface enforces a write sandbox that restricts file saves to configured root directories. Errors occur when `savePath` resolves to a location outside the allowed roots.\n\n```typescript\nexport function checkWritePath(\n  target: string,\n  roots: string[],\n): { ok: true; resolved: string } | { ok: false; reason: string } {\n  // ... validation logic\n  return {\n    ok: false,\n    reason: `'${reattached}' is outside the allowed write roots: [${roots.map((r) => `'${r}'`).join(\", \")}]`,\n  };\n}\n```\n\n资料来源：[src/sandbox.ts:1-100](src/sandbox.ts)\n\n### Allowed Write Roots Configuration\n\n| Platform | Default Roots | Delimiter |\n|---|---|---|\n| POSIX | `os.tmpdir()` + `process.cwd()` | `:` |\n| Windows | `os.tmpdir()` + `process.cwd()` | `;` |\n\nOverride with `MARKFETCH_ALLOWED_WRITE_ROOTS` environment variable. When set, this **replaces** the defaults entirely rather than merging.\n\n资料来源：[README.md](README.md)\n\n### Symlink Handling\n\nThe sandbox correctly resolves symlinks to prevent escape attempts like `<sandbox>/link/../out.md` where `link` points outside the sandbox. The canonicalized path flows from the containment check into `writeFile`, ensuring the file is created exactly at the validated location.\n\n资料来源：[CHANGELOG.md](CHANGELOG.md), [src/sandbox.ts:1-100](src/sandbox.ts)\n\n## Error Classification\n\nThe `classifyError` function normalizes different error types into the `MarkfetchError` format used throughout the system:\n\n```typescript\nfunction classifyError(err: unknown): { code: string; message: string } {\n  if (err instanceof MarkfetchError) {\n    return { code: err.code, message: err.message };\n  }\n  if (err instanceof Error) {\n    return { code: \"network_error\", message: err.message };\n  }\n  return { code: \"network_error\", message: String(err) };\n}\n```\n\n资料来源：[src/core.ts:1-100](src/core.ts)\n\n### Error Source Mapping\n\n| Error Source | Code Produced |\n|---|---|\n| `MarkfetchError` instances | Original code preserved |\n| `Error` instances | `network_error` |\n| Non-Error values | `network_error` with string coercion |\n\n## Unified Error Flow\n\nVersion 0.5.0 introduced a refactoring where three inline `return errorResult(...)` sites in the MCP handler were converted to throw `MarkfetchError` from core uniformly. Both adapters now catch and convert errors consistently.\n\nThis architectural change ensures that both CLI and MCP interfaces produce identical error codes and messages for the same failure conditions.\n\n资料来源：[CHANGELOG.md](CHANGELOG.md)\n\n## Best Practices for Error Handling\n\n### For MCP Clients\n\n1. Check `isError` field in the response object\n2. Parse the `content[0].text` field for the `[code] message` format\n3. Handle `extraction_failed` gracefully for client-rendered SPAs\n4. Use `savePath` parameter for large responses to avoid tool-result truncation\n\n### For CLI Consumers\n\n1. Redirect stderr to capture error codes\n2. Parse `[code] message` format from stderr\n3. Use `markfetch url 2>&1 | head -1` to get the error\n\n### For Save Operations\n\n1. Always use absolute paths for `savePath`\n2. Verify `MARKFETCH_ALLOWED_WRITE_ROOTS` includes your target directory\n3. Check for `save_forbidden` before `save_failed` in error handling logic\n\n---\n\n<a id='development'></a>\n\n## Development Guide\n\n### 相关页面\n\n相关主题：[Introduction](#introduction), [Quick Start Guide](#quickstart)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n</details>\n\n# Development Guide\n\nThis guide provides comprehensive information for developers who want to understand, extend, or contribute to markfetch.\n\n## Overview\n\nmarkfetch is a Node.js tool that fetches URLs and converts web content to clean markdown. It operates in two modes:\n\n1. **CLI Mode** - Command-line interface for shell integration\n2. **MCP Mode** - Model Context Protocol server for AI agent integration\n\nThe project requires Node.js ≥ 24 and is distributed as an npm package. 资料来源：[package.json:8]()\n\n## Architecture\n\n```mermaid\ngraph TD\n    A[User Input] --> B{process.argv.length}\n    B -->|≥ 2 args| C[CLI Adapter]\n    B -->|Zero args| D[MCP Adapter]\n    \n    C --> E[src/cli.ts]\n    D --> F[src/mcp.ts]\n    \n    E --> G[src/core.ts]\n    F --> G\n    \n    G --> H[undici HTTP Client]\n    G --> I[linkedom HTML Parser]\n    G --> J[@mozilla/readability]\n    G --> K[turndown]\n    \n    H --> L[HTTP Response]\n    I --> M[DOM Document]\n    J --> N[Extracted Article]\n    K --> O[Markdown Output]\n```\n\n### Core Pipeline (src/core.ts)\n\nThe core module implements the main fetch-and-convert pipeline. It orchestrates:\n\n| Component | Role |\n|-----------|------|\n| `undici` | HTTP/2 transport with Chrome-like fingerprinting |\n| `linkedom` | HTML parsing to DOM |\n| `@mozilla/readability` | Article content extraction |\n| `turndown` | HTML to markdown conversion |\n\n资料来源：[src/core.ts:1-50]()\n\n### Adapters (src/cli.ts & src/mcp.ts)\n\nThe source is structured into three distinct files:\n\n| File | Purpose |\n|------|---------|\n| `src/core.ts` | Pipeline + errors (shared logic) |\n| `src/mcp.ts` | MCP stdio server adapter |\n| `src/cli.ts` | CLI argv parser + dispatcher |\n| `src/index.ts` | Lazy-import dispatcher based on `process.argv.length` |\n\n资料来源：[README.md:95-100]()\n\nThe lazy-import dispatcher ensures `console.log` calls in `cli.ts` are never reachable from the MCP path, maintaining the invariant that stdout is reserved for MCP frames. 资料来源：[CHANGELOG.md:45-47]()\n\n## Setting Up the Development Environment\n\n### Prerequisites\n\n- Node.js ≥ 24\n- npm or yarn\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/vasylenko/markfetch.git\ncd markfetch\n\n# Install dependencies\nnpm install\n```\n\n### Available Scripts\n\n| Script | Command | Purpose |\n|--------|---------|---------|\n| `dev` | `npm run dev` | Run source directly with tsx (no build required) |\n| `build` | `npm run build` | Compile TypeScript to JavaScript |\n| `test` | `npm run test` | Run test suite with tsx |\n| `inspect` | `npm run inspect` | Launch MCP inspector for debugging |\n\n资料来源：[package.json:21-28]()\n\n### Build Process\n\nThe build process consists of two steps:\n\n```bash\n# Compile TypeScript\nnpm run build\n\n# Post-build script (automatically runs after build)\nnpm run postbuild\n```\n\nThe postbuild script (`scripts/postbuild.mjs`) performs additional transformations after TypeScript compilation. 资料来源：[package.json:26]()\n\n## Project Structure\n\n```\nmarkfetch/\n├── src/\n│   ├── index.ts      # Entry point with argv dispatcher\n│   ├── core.ts       # Core fetch/extract/convert pipeline\n│   ├── cli.ts        # CLI adapter using commander\n│   ├── mcp.ts        # MCP stdio server\n│   └── sandbox.ts    # Write path sandboxing\n├── dist/             # Compiled JavaScript output\n├── tests/            # Test fixtures and test files\n├── scripts/\n│   └── postbuild.mjs # Post-compilation transformations\n└── docs/\n    └── SPEC.md       # Detailed specification\n```\n\n## Configuration\n\n### Environment Variables\n\n| Variable | Default | Purpose |\n|----------|---------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in milliseconds |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Override the User-Agent header |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | `os.tmpdir()` + `process.cwd()` | MCP-only write sandbox roots |\n\n资料来源：[README.md:60-66]()\n\n### Configuration Precedence\n\n1. Environment variables set at startup\n2. Command-line flags (CLI mode)\n3. MCP tool parameters (MCP mode)\n\n## Core API\n\n### fetchMarkdown Function\n\nThe main function exported from `core.ts`:\n\n```typescript\ninterface FetchOptions {\n  url: string;\n  savePath?: string;\n}\n\ninterface FetchResult {\n  markdown: string;\n  bytes: number;\n  savedTo?: string;\n}\n```\n\n### Error Handling\n\nThe core module defines eight deterministic error codes:\n\n| Code | Meaning |\n|------|---------|\n| `network_error` | DNS/TCP/TLS failure |\n| `http_error` | Non-2xx HTTP status |\n| `timeout` | Request timeout exceeded |\n| `unsupported_content_type` | Not `text/html` or `application/xhtml+xml` |\n| `extraction_failed` | Readability found no article content |\n| `too_large` | Response or markdown exceeded size cap |\n| `save_failed` | File write failed (permissions, missing directory) |\n| `save_forbidden` | Path outside allowed write roots |\n\n资料来源：[README.md:71-80]()\n\nErrors are thrown as `MarkfetchError` from core uniformly and caught by adapters for conversion. 资料来源：[CHANGELOG.md:49-51]()\n\n## Extending the Pipeline\n\n### Adding New HTML Rewrites\n\nThe `rewriteForReadability()` function in `core.ts` handles pre-extraction HTML transformations:\n\n```typescript\nfunction rewriteForReadability(document: Document): void {\n  // Transform <aside class=\"footnote-brackets\"> to <section>\n  // Flatten <details> elements\n  // Replace div.mw-heading with their heading children\n}\n```\n\nTo add new rewrite rules, append to this function before the return statement. 资料来源：[src/core.ts:120-160]()\n\n### Customizing Markdown Conversion\n\nThe `TURNDOWN` instance is configured with:\n\n| Plugin/Option | Purpose |\n|---------------|---------|\n| `gfm` plugin | GitHub Flavored Markdown support |\n| `keepClasses: true` | Preserve `class=\"language-X\"` for code fences |\n| Custom escape | Handle `-`/`=` after inline elements |\n\n资料来源：[src/core.ts:50-90]()\n\n### Modifying Error Handling\n\nError handling flows through the `MarkfetchError` class in core:\n\n1. Core throws `MarkfetchError` with code and message\n2. Adapters catch and format for their protocol\n3. CLI: writes `[code] message` to stderr\n4. MCP: returns `{ content: [...], isError: true }`\n\n资料来源：[src/cli.ts:35-42]() 和 [src/mcp.ts:15-20]()\n\n## Write Sandbox\n\nThe MCP adapter enforces write path restrictions:\n\n```mermaid\ngraph TD\n    A[MCP savePath] --> B{absolutely path?}\n    B -->|No| C[Refine fails: savePath must be absolute]\n    B -->|Yes| D{Inside allowed roots?}\n    D -->|Yes| E[Write file]\n    D -->|No| F[Return save_forbidden error]\n```\n\n### Configuring Allowed Roots\n\nSet the environment variable with platform delimiter:\n\n```bash\n# POSIX\nexport MARKFETCH_ALLOWED_WRITE_ROOTS=\"/tmp:/home/user/docs\"\n\n# Windows\nset MARKFETCH_ALLOWED_WRITE_ROOTS=\"C:\\Users\\me\\docs;C:\\temp\"\n```\n\nThe sandbox checks resolve symlinks and applies case-folding on Windows. 资料来源：[src/sandbox.ts:20-40]()\n\n## Testing\n\n### Running Tests\n\n```bash\nnpm test\n```\n\n### Test Structure\n\nTests use Node.js built-in test runner (`--test` flag) with tsx for TypeScript support. 资料来源：[package.json:27]()\n\n### Writing New Tests\n\n1. Place test files in `tests/` directory\n2. Use `*.test.ts` naming pattern\n3. Run with `tsx --test tests/*.test.ts`\n\n## MCP Inspector\n\nDebug MCP integration using the official inspector:\n\n```bash\nnpm run inspect\n```\n\nThis launches the MCP inspector at `http://localhost:6274` where you can:\n- Test tool calls interactively\n- Inspect request/response frames\n- Verify schema validation\n\n资料来源：[package.json:27]()\n\n## Dependencies\n\n### Production Dependencies\n\n| Package | Version | Purpose |\n|---------|---------|---------|\n| `@modelcontextprotocol/sdk` | ^1.29.0 | MCP server implementation |\n| `@mozilla/readability` | ^0.5.0 | Article extraction |\n| `commander` | ^14.0.3 | CLI argument parsing |\n| `linkedom` | ^0.18.0 | HTML parsing |\n| `turndown` | ^7.0.0 | HTML to markdown |\n| `turndown-plugin-gfm` | ^1.0.2 | GFM support |\n| `undici` | ^8.2.0 | HTTP client |\n| `zod` | ^3.0.0 | Schema validation |\n\n### Development Dependencies\n\n| Package | Purpose |\n|---------|---------|\n| `@types/node` | Node.js type definitions |\n| `@types/turndown` | Turndown type definitions |\n| `tsx` | TypeScript execution |\n| `typescript` | TypeScript compiler |\n\n资料来源：[package.json:30-50]()\n\n## Version History\n\n| Version | Date | Key Changes |\n|---------|------|-------------|\n| 0.6.0 | 2026-05-13 | Write sandbox, Windows CI, save_forbidden error |\n| 0.5.0 | 2026-05-12 | CLI mode, commander dependency |\n| 0.4.1 | 2026-05-11 | README rewrite, bin path fix |\n| 0.4.0 | 2026-05-10 | MCP server with fetch_markdown tool |\n\n资料来源：[CHANGELOG.md:1-60]()\n\n## Contributing Guidelines\n\n### Code Standards\n\n- All source in TypeScript under `src/`\n- Build output to `dist/` via `npm run build`\n- Tests in `tests/` with `*.test.ts` pattern\n- No runtime `console.log` in MCP path (enforced by lazy-import structure)\n\n### Pull Request Checklist\n\n- [ ] Run `npm run build` successfully\n- [ ] Run `npm test` with all tests passing\n- [ ] Update CHANGELOG.md with changes\n- [ ] Ensure documentation reflects new behavior\n\n### Release Process\n\n```bash\nnpm run prepublishOnly\n```\n\nThis runs the build automatically before npm publish. 资料来源：[package.json:29]()\n\n---\n\n---\n\n## Doramagic 踩坑日志\n\n项目：vasylenko/markfetch\n\n摘要：发现 7 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：安装坑 - 来源证据：v0.4.1。\n\n## 1. 安装坑 · 来源证据：v0.4.1\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v0.4.1\n- 对用户的影响：可能增加新用户试用和生产接入成本。\n- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。\n- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。\n- 证据：community_evidence:github | cevd_749b65614f7b40e0b524f4e932cd4aca | https://github.com/vasylenko/markfetch/releases/tag/v0.4.1 | 来源讨论提到 node 相关条件，需在安装/试用前复核。\n\n## 2. 能力坑 · 能力判断依赖假设\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：README/documentation is current enough for a first validation pass.\n- 对用户的影响：假设不成立时，用户拿不到承诺的能力。\n- 建议检查：将假设转成下游验证清单。\n- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。\n- 证据：capability.assumptions | github_repo:1234238440 | https://github.com/vasylenko/markfetch | README/documentation is current enough for a first validation pass.\n\n## 3. 维护坑 · 维护活跃度未知\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：未记录 last_activity_observed。\n- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。\n- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。\n- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。\n- 证据：evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | last_activity_observed missing\n\n## 4. 安全/权限坑 · 下游验证发现风险项\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：no_demo\n- 对用户的影响：下游已经要求复核，不能在页面中弱化。\n- 建议检查：进入安全/权限治理复核队列。\n- 防护动作：下游风险存在时必须保持 review/recommendation 降级。\n- 证据：downstream_validation.risk_items | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium\n\n## 5. 安全/权限坑 · 存在评分风险\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：no_demo\n- 对用户的影响：风险会影响是否适合普通用户安装。\n- 建议检查：把风险写入边界卡，并确认是否需要人工复核。\n- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。\n- 证据：risks.scoring_risks | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium\n\n## 6. 维护坑 · issue/PR 响应质量未知\n\n- 严重度：low\n- 证据强度：source_linked\n- 发现：issue_or_pr_quality=unknown。\n- 对用户的影响：用户无法判断遇到问题后是否有人维护。\n- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。\n- 防护动作：issue/PR 响应未知时，必须提示维护风险。\n- 证据：evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | issue_or_pr_quality=unknown\n\n## 7. 维护坑 · 发布节奏不明确\n\n- 严重度：low\n- 证据强度：source_linked\n- 发现：release_recency=unknown。\n- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。\n- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。\n- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。\n- 证据：evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | release_recency=unknown\n\n<!-- canonical_name: vasylenko/markfetch; human_manual_source: deepwiki_human_wiki -->\n",
      "markdown_key": "markfetch",
      "pages": "draft",
      "source_refs": [
        {
          "evidence_id": "github_repo:1234238440",
          "kind": "repo",
          "supports_claim_ids": [
            "claim_identity",
            "claim_distribution",
            "claim_capability"
          ],
          "url": "https://github.com/vasylenko/markfetch"
        },
        {
          "evidence_id": "art_af64b5f930b64736aa1d4abc1e690f07",
          "kind": "docs",
          "supports_claim_ids": [
            "claim_identity",
            "claim_distribution",
            "claim_capability"
          ],
          "url": "https://github.com/vasylenko/markfetch#readme"
        }
      ],
      "summary": "DeepWiki/Human Wiki 完整输出，末尾追加 Discovery Agent 踩坑日志。",
      "title": "markfetch 说明书",
      "toc": [
        "https://github.com/vasylenko/markfetch 项目说明书",
        "目录",
        "Introduction",
        "What is markfetch?",
        "Core Design Philosophy",
        "Architecture Overview",
        "Two Operating Modes",
        "Content Extraction Pipeline",
        "Doramagic 踩坑日志"
      ]
    }
  },
  "quality_gate": {
    "blocking_gaps": [],
    "category_confidence": "medium",
    "compile_status": "ready_for_review",
    "five_assets_present": true,
    "install_sandbox_verified": true,
    "missing_evidence": [],
    "next_action": "publish to Doramagic.ai project surfaces",
    "prompt_preview_boundary_ok": true,
    "publish_status": "publishable",
    "quick_start_verified": true,
    "repo_clone_verified": true,
    "repo_commit": "bab725135ec30a217db6f34618e5e27772cee1e7",
    "repo_inspection_error": null,
    "repo_inspection_files": [
      "package.json",
      "README.md",
      "docs/SPEC.md",
      "src/index.ts",
      "src/mcp.ts",
      "src/cli.ts",
      "src/sandbox.ts",
      "src/core.ts"
    ],
    "repo_inspection_verified": true,
    "review_reasons": [
      "community_discussion_evidence_below_public_threshold"
    ],
    "tag_count_ok": true,
    "unsupported_claims": []
  },
  "schema_version": "0.1",
  "user_assets": {
    "ai_context_pack": {
      "asset_id": "ai_context_pack",
      "filename": "AI_CONTEXT_PACK.md",
      "markdown": "# markfetch - Doramagic AI Context Pack\n\n> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。\n\n## 充分原则\n\n- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。\n- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。\n\n## 给宿主 AI 的使用方式\n\n你正在读取 Doramagic 为 markfetch 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。\n\n## Claim 消费规则\n\n- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。\n- **事实最低状态**：`supported`\n- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。\n- `weak`：只能作为低置信度线索，必须要求用户继续核实。\n- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。\n- `unverified`：不得作为事实使用，应明确说证据不足。\n- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。\n\n## 它最适合谁\n\n- **正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**：README 或插件配置提到多个宿主 AI。 证据：`README.md` Claim：`clm_0002` supported 0.86\n\n## 它能做什么\n\n- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md` Claim：`clm_0001` supported 0.86\n\n## 怎么开始\n\n- `npm i -g markfetch` 证据：`README.md` Claim：`clm_0003` supported 0.86, `clm_0008` supported 0.86\n- `claude mcp add --scope user markfetch -- npx -y markfetch` 证据：`README.md` Claim：`clm_0004` supported 0.86\n- `npx -y markfetch https://example.com/article` 证据：`README.md` Claim：`clm_0005` supported 0.86, `clm_0006` supported 0.86, `clm_0007` supported 0.86\n- `npx -y markfetch https://example.com/article -o article.md` 证据：`README.md` Claim：`clm_0006` supported 0.86\n- `npx -y markfetch https://example.com/article | pandoc -o article.pdf` 证据：`README.md` Claim：`clm_0007` supported 0.86\n- `npm i -g markfetch         # then anywhere: markfetch <url>` 证据：`README.md` Claim：`clm_0008` supported 0.86\n- `npm i -D markfetch         # then in package.json scripts: \"markfetch <url>\"` 证据：`README.md` Claim：`clm_0009` supported 0.86\n\n## 继续前判断卡\n\n- **当前建议**：先做权限沙盒试用\n- **为什么**：项目存在安装命令、宿主配置或本地写入线索，不建议直接进入主力环境，应先在隔离环境试装。\n\n### 30 秒判断\n\n- **现在怎么做**：先做权限沙盒试用\n- **最小安全下一步**：先跑 Prompt Preview；若仍要安装，只在隔离环境试装\n- **先别相信**：工具权限边界不能在安装前相信。\n- **继续会触碰**：命令执行、本地环境或项目文件、宿主 AI 上下文\n\n### 现在可以相信\n\n- **适合人群线索：正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86\n- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86\n- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0003` supported 0.86, `clm_0008` supported 0.86\n\n### 现在还不能相信\n\n- **工具权限边界不能在安装前相信。**（unverified）：MCP/tool 类项目通常会触碰文件、网络、浏览器或外部 API，必须真实检查权限和日志。\n- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。\n- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。\n- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。\n- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。\n- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。\n- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。\n- **安装命令是否需要网络、权限或全局写入？**（unverified）：这影响企业环境和个人环境的安装风险。 证据：`README.md`\n\n### 继续会触碰什么\n\n- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`\n- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`\n- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。\n\n### 最小安全下一步\n\n- **先跑 Prompt Preview**：用安装前交互式试用判断工作方式是否匹配，不需要授权或改环境。（适用：任何项目都适用，尤其是输出质量未知时。）\n- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）\n- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）\n\n### 退出方式\n\n- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。\n- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。\n- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。\n\n## 哪些只能预览\n\n- 解释项目适合谁和能做什么\n- 基于项目文档演示典型对话流程\n- 帮助用户判断是否值得安装或继续研究\n\n## 哪些必须安装后验证\n\n- 真实安装 Skill、插件或 CLI\n- 执行脚本、修改本地文件或访问外部服务\n- 验证真实输出质量、性能和兼容性\n\n## 边界与风险判断卡\n\n- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0010` inferred 0.45\n- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0011` supported 0.86\n- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。\n- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。\n- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。\n\n## 开工前工作上下文\n\n### 加载顺序\n\n- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。\n- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。\n- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。\n- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。\n- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。\n\n### 任务路由\n\n- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86\n\n### 上下文规模\n\n- 文件总数：42\n- 重要文件覆盖：31/42\n- 证据索引条目：31\n- 角色 / Skill 条目：12\n\n### 证据不足时的处理\n\n- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。\n- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。\n- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。\n- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。\n\n## Prompt Recipes\n\n### 适配判断\n\n- 目标：判断这个项目是否适合用户当前任务。\n- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。\n\n```text\n请基于 markfetch 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。\n```\n\n### 安装前体验\n\n- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。\n- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。\n\n```text\n请把 markfetch 当作安装前体验资产，而不是已安装工具或真实运行环境。\n\n请严格输出四段：\n1. 先问我 3 个必要问题。\n2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。\n3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。\n4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。\n\n硬性边界：\n- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。\n- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。\n- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。\n- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。\n- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。\n- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。\n\n```\n\n### 角色 / Skill 选择\n\n- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。\n- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。\n\n```text\n请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。\n```\n\n### 风险预检\n\n- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。\n- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。\n\n```text\n请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。\n```\n\n### 宿主 AI 开工指令\n\n- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。\n- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。\n\n```text\n请基于 markfetch 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。\n```\n\n\n## 角色 / Skill 索引\n\n- 共索引 12 个角色 / Skill / 项目文档条目。\n\n- **markfetch**（project_doc）：Reader View for AI agents and your shell. Fetch any URL, get back clean markdown — with a real Chrome's request fingerprint, not curl's. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`\n- **markfetch — SPEC**（project_doc）：Errors throw MarkfetchError uniformly from core; adapters catch once. Codes: network error , http error , timeout , unsupported content type , extraction failed , too large , save failed ; plus save forbidden , emitted by the MCP adapter only before fetchMarkdown runs — see \"Asymmetric write sandbox\" under Core Decisions . CLI emits code message to stderr and exits 1; MCP emits { isError: true, content: { text: \" co… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/SPEC.md`\n- **Changelog**（project_doc）：All notable changes to this project are documented in this file. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`CHANGELOG.md`\n- **Escape policy fixture**（project_doc）：The protocol uses a fixed Huffman code https://en.wikipedia.org/wiki/Huffman coding -based header compression algorithm to keep responses bandwidth-efficient. The phrase above mirrors a real pattern observed on Wikipedia: a link followed immediately by a hyphenated suffix in the next text node. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/fixtures/01-escape-policy-mid-prose.expected.md`\n- **Citation bracket fixture**（project_doc）：HTTP/2 was developed by the IETF working group \\ 1\\ http://mock/ cite 1 based on Google's earlier SPDY protocol \\ 2\\ http://mock/ cite 2 . The standardisation document is RFC 7540, later obsoleted by RFC 9113. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/fixtures/02-citation-bracket-link.expected.md`\n- **Informational responses http://mock/ informational responses**（project_doc）：Informational responses http://mock/ informational responses 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/fixtures/03-anchor-chrome-mdn-style.expected.md`\n- **json — JSON encoder and decoder ¶ http://mock/ module-json \"Link to this heading\"**（project_doc）：json — JSON encoder and decoder ¶ http://mock/ module-json \"Link to this heading\" 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/fixtures/04-sphinx-permalink.expected.md`\n- **Worldwide race to trace passengers from hantavirus-hit cruise ship**（project_doc）：Worldwide race to trace passengers from hantavirus-hit cruise ship 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/fixtures/05-no-h1-bbc-style.expected.md`\n- **Multi-line table cell fixture**（project_doc）：The conversion table below contains cells with bullet lists and multi-line content. CommonMark pipe-tables cannot express these structurally; the converter must either fall back to raw HTML or degrade gracefully without producing a broken pipe-table. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/fixtures/06-multi-line-table-cell.expected.md`\n- **Intraword underscore fixture**（project_doc）：Function signatures often italicise parameter names, producing fragments like json.dump obj, fp, \\ , skipkeys=False, ensure ascii=True, \\ \\ kw in rendered docs. CommonMark's left-flanking-delimiter rule means an underscore flanked by alphanumerics on both sides cannot open emphasis, so escaping it is unnecessary noise. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/fixtures/07-intraword-underscore.expected.md`\n- **Code fence language fixture**（project_doc）：Many documentation generators emit syntax-highlighted code blocks with a language hint encoded in the inner code element's class attribute. Common patterns include language-python , lang-js , and Highlight.js's hljs language-typescript . This fixture exercises whether markfetch preserves the language hint when emitting the fenced markdown code block. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/fixtures/08-code-fence-language.expected.md`\n- **Baseline clean article**（project_doc）：This fixture represents the head-of-distribution use case: an editorial article with a single H1, a few H2 sections, plain paragraphs, one inline link to example.com https://example.com/ , and a small unordered list. Nothing here exercises any edge case under repair. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/fixtures/09-baseline-clean-article.expected.md`\n\n## 证据索引\n\n- 共索引 31 条证据。\n\n- **markfetch**（documentation）：Reader View for AI agents and your shell. Fetch any URL, get back clean markdown — with a real Chrome's request fingerprint, not curl's. 证据：`README.md`\n- **Package**（package_manifest）：{ \"name\": \"markfetch\", \"version\": \"0.6.0\", \"description\": \"Fetch a URL, return clean markdown. MCP server and CLI for AI agents.\", \"license\": \"MIT\", \"author\": { \"name\": \"Serhii Vasylenko\", \"email\": \"serhii@vasylenko.info\", \"url\": \"https://devdosvid.blog\" }, \"type\": \"module\", \"private\": false, \"engines\": { \"node\": \" =24\" }, \"bin\": { \"markfetch\": \"dist/index.js\" }, \"files\": \"dist\", \"LICENSE\", \"README.md\" , \"keywords\": \"mcp\", \"mcp-server\", \"model-context-protocol\", \"markdown\", \"fetch\", \"html-to-markdown\", \"scraping\", \"readability\", \"ai-agent\", \"claude\" , \"repository\": { \"type\": \"git\", \"url\": \"git+https://github.com/vasylenko/markfetch.git\" }, \"bugs\": { \"url\": \"https://github.com/vasylenko/mark… 证据：`package.json`\n- **License**（source_file）：Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files the \"Software\" , to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 证据：`LICENSE`\n- **markfetch — SPEC**（documentation）：Errors throw MarkfetchError uniformly from core; adapters catch once. Codes: network error , http error , timeout , unsupported content type , extraction failed , too large , save failed ; plus save forbidden , emitted by the MCP adapter only before fetchMarkdown runs — see \"Asymmetric write sandbox\" under Core Decisions . CLI emits code message to stderr and exits 1; MCP emits { isError: true, content: { text: \" code message\" } } . 证据：`docs/SPEC.md`\n- **Changelog**（documentation）：All notable changes to this project are documented in this file. 证据：`CHANGELOG.md`\n- **Escape policy fixture**（documentation）：The protocol uses a fixed Huffman code https://en.wikipedia.org/wiki/Huffman coding -based header compression algorithm to keep responses bandwidth-efficient. The phrase above mirrors a real pattern observed on Wikipedia: a link followed immediately by a hyphenated suffix in the next text node. 证据：`tests/fixtures/01-escape-policy-mid-prose.expected.md`\n- **Citation bracket fixture**（documentation）：HTTP/2 was developed by the IETF working group \\ 1\\ http://mock/ cite 1 based on Google's earlier SPDY protocol \\ 2\\ http://mock/ cite 2 . The standardisation document is RFC 7540, later obsoleted by RFC 9113. 证据：`tests/fixtures/02-citation-bracket-link.expected.md`\n- **Informational responses http://mock/ informational responses**（documentation）：Informational responses http://mock/ informational responses 证据：`tests/fixtures/03-anchor-chrome-mdn-style.expected.md`\n- **json — JSON encoder and decoder ¶ http://mock/ module-json \"Link to this heading\"**（documentation）：json — JSON encoder and decoder ¶ http://mock/ module-json \"Link to this heading\" 证据：`tests/fixtures/04-sphinx-permalink.expected.md`\n- **Worldwide race to trace passengers from hantavirus-hit cruise ship**（documentation）：Worldwide race to trace passengers from hantavirus-hit cruise ship 证据：`tests/fixtures/05-no-h1-bbc-style.expected.md`\n- **Multi-line table cell fixture**（documentation）：The conversion table below contains cells with bullet lists and multi-line content. CommonMark pipe-tables cannot express these structurally; the converter must either fall back to raw HTML or degrade gracefully without producing a broken pipe-table. 证据：`tests/fixtures/06-multi-line-table-cell.expected.md`\n- **Intraword underscore fixture**（documentation）：Function signatures often italicise parameter names, producing fragments like json.dump obj, fp, \\ , skipkeys=False, ensure ascii=True, \\ \\ kw in rendered docs. CommonMark's left-flanking-delimiter rule means an underscore flanked by alphanumerics on both sides cannot open emphasis, so escaping it is unnecessary noise. 证据：`tests/fixtures/07-intraword-underscore.expected.md`\n- **Code fence language fixture**（documentation）：Many documentation generators emit syntax-highlighted code blocks with a language hint encoded in the inner code element's class attribute. Common patterns include language-python , lang-js , and Highlight.js's hljs language-typescript . This fixture exercises whether markfetch preserves the language hint when emitting the fenced markdown code block. 证据：`tests/fixtures/08-code-fence-language.expected.md`\n- **Baseline clean article**（documentation）：This fixture represents the head-of-distribution use case: an editorial article with a single H1, a few H2 sections, plain paragraphs, one inline link to example.com https://example.com/ , and a small unordered list. Nothing here exercises any edge case under repair. 证据：`tests/fixtures/09-baseline-clean-article.expected.md`\n- **.Mcp**（structured_config）：{ \"mcpServers\": { \"markfetch\": { \"command\": \"npx\", \"args\": \"tsx\", \"src/index.ts\" } } } 证据：`.mcp.json`\n- **Tsconfig**（structured_config）：{ \"compilerOptions\": { \"target\": \"ES2022\", \"module\": \"NodeNext\", \"moduleResolution\": \"NodeNext\", \"strict\": true, \"outDir\": \"dist\", \"rootDir\": \"src\", \"esModuleInterop\": true, \"skipLibCheck\": true, \"declaration\": false, \"resolveJsonModule\": true, \"forceConsistentCasingInFileNames\": true }, \"include\": \"src/ / \" } 证据：`tsconfig.json`\n- **Keep text files LF on all platforms. Windows runners would otherwise**（source_file）：Keep text files LF on all platforms. Windows runners would otherwise autocrlf .md fixtures to CRLF and break snapshot tests. text=auto eol=lf 证据：`.gitattributes`\n- **.gitignore**（source_file）：node modules/ dist/ .log .DS Store .tgz 证据：`.gitignore`\n- **Source, tests, and TS build config not needed at runtime; dist/ ships instead**（source_file）：Source, tests, and TS build config not needed at runtime; dist/ ships instead src/ tests/ .ts ! .d.ts tsconfig.json 证据：`.npmignore`\n- **Postbuild**（source_file）：// Sets execute bit on dist/index.js so the shebang-based launch works — // both when npm links the bin entry npm/npx exec the linked target // and when running ./dist/index.js directly. tsc preserves the shebang // but doesn't chmod its outputs. import { chmodSync } from \"node:fs\"; 证据：`scripts/postbuild.mjs`\n- **Cli**（source_file）：// CLI adapter. Imported lazily by index.ts when any argument is present // bare invocation routes to mcp.ts instead, preserving the existing MCP // server contract for every client config that doesn't pass args . // // Output channels: // - stdout: markdown body no -o OR \"Saved N bytes to \" confirmation // with -o . The markdown is written via process.stdout.write so its // own trailing whitespace is preserved verbatim — same bytes as the MCP // adapter would emit in content 0 .text. // - stderr: \" code message\" on any error path. Exits with non-zero code. // The project principle \"no ANSI escapes\" extends here — keep stderr // plain so shell pipelines can grep / split on the code prefix. 证据：`src/cli.ts`\n- **Core**（source_file）：// Pure pipeline + error types. Imported by both adapters mcp.ts and cli.ts . // Invariants: // - This module MUST NOT write to stdout or stderr. The MCP adapter relies on // stdout staying empty any non-JSON-RPC byte corrupts the protocol frame ; // the CLI adapter owns its own output channel. Errors are thrown, never // printed. // - This module MUST NOT import from @modelcontextprotocol/sdk or commander. // Keeping core transport-agnostic is what lets the dispatcher in index.ts // lazy-load only the adapter that's actually needed. 证据：`src/core.ts`\n- **!/usr/bin/env node**（source_file）：// Argv-discriminated dispatcher. // // process.argv.length === 2 means the user provided zero arguments // argv 0 is the path to node, argv 1 is this script path . That's the // shape every MCP client uses when spawning a server — so bare invocation // routes to the MCP adapter and preserves every existing client config. // // Any extra arg a URL, --help , --version , -o , even an unknown flag // routes to the CLI adapter, which uses commander to parse and validate. // // The dynamic import \"./mcp.js\" vs import \"./cli.js\" is intentional: // it ensures the MCP path never loads commander, and the CLI path never // loads @modelcontextprotocol/sdk. More importantly, it makes the stdout // inva… 证据：`src/index.ts`\n- **Mcp**（source_file）：// MCP adapter. Imported lazily by index.ts when invoked with zero arguments // the standard MCP client spawn shape . Wraps the unified fetchMarkdown // from core in the MCP tool-content shape and connects over stdio. // // Invariant: nothing in this module — or anything reachable from it — may // write to stdout. Stdout is the JSON-RPC frame channel; arbitrary writes // corrupt protocol framing and the client disconnects. Errors are returned // inside the MCP {isError: true, content: ... } envelope, not printed. // Stderr is also reserved project principle: stderr is fatal-only — every // per-request error round-trips through errorResult , never through logging. 证据：`src/mcp.ts`\n- **Sandbox**（source_file）：// Write-path containment for the MCP adapter. MCP's caller is a language // model — possibly steered by the page it just fetched — so this module // bounds the filesystem paths it can write to. CLI is intentionally // unbounded human at the shell is the security boundary ; only MCP uses // this module. // // Invariants: // - Leaf module: no imports from siblings, unit-testable in isolation. // - No console. — buildAllowedRoots throws escapes module init in // mcp.ts, surfaces on stderr ; checkPath returns a discriminated union. // - No hardcoded platform paths; every platform-dependent value comes // from a Node API. 证据：`src/sandbox.ts`\n- **Helpers**（source_file）：// Shared test helpers extracted from cli.test.ts / server.test.ts / e2e.test.ts // / snapshots.test.ts to remove copy-paste duplication. Not a test file itself // — the runner pattern tsx --test tests/ .test.ts see package.json excludes // this file by name. 证据：`tests/_helpers.ts`\n- **Cli.Test**（source_file）：// CLI tests. Run the dispatcher via tsx src/index.ts as a real // subprocess so we observe exit codes, stdout, and stderr — the things // shell consumers actually depend on. The MCP SDK Client is irrelevant // here; this is a plain CLI surface. import { test } from \"node:test\"; import assert from \"node:assert/strict\"; import { execFile } from \"node:child process\"; import { promisify } from \"node:util\"; import { mkdtemp, readFile, rm, stat } from \"node:fs/promises\"; import { tmpdir } from \"node:os\"; import { join, resolve as resolvePath } from \"node:path\"; import { startMock, HAPPY FIXTURE, TSX LOADER URL } from \"./ helpers.js\"; 证据：`tests/cli.test.ts`\n- **E2E.Test**（source_file）：// E2E tests against the BUILT JS output node dist/index.js , not the dev // source. server.test.ts already exercises the full surface via tsx; this file // verifies that tsc output is itself correct and runnable. If server.test.ts // passes but this file fails, the bug lives in the build pipeline, not the // runtime logic. import { test, before } from \"node:test\"; import assert from \"node:assert/strict\"; import { execFile, execSync } from \"node:child process\"; import { promisify } from \"node:util\"; import { Client } from \"@modelcontextprotocol/sdk/client/index.js\"; import { StdioClientTransport } from \"@modelcontextprotocol/sdk/client/stdio.js\"; import { mkdtemp, readFile, rm } from \"node:… 证据：`tests/e2e.test.ts`\n- **Sandbox.Test**（source_file）：// Unit tests for src/sandbox.ts — narrow path-edge-cases that are painful // to validate via the integration boundary in server.test.ts ../ traversal, // prefix-overlap, multi-entry env split, fail-fast variants without an // integration analog, win32 case-fold . All other sandbox behaviors are // covered by T9–T13 in server.test.ts. 证据：`tests/sandbox.test.ts`\n- **Server.Test**（source_file）：import { test } from \"node:test\"; import assert from \"node:assert/strict\"; import { Client } from \"@modelcontextprotocol/sdk/client/index.js\"; import { StdioClientTransport } from \"@modelcontextprotocol/sdk/client/stdio.js\"; import { mkdtemp, readFile, stat, access, writeFile, rm, mkdir, symlink, } from \"node:fs/promises\"; import { tmpdir } from \"node:os\"; import { join, parse } from \"node:path\"; import { startMock, textOf, HAPPY FIXTURE, spawnClient, assertSchemaRejection, spawnAndCaptureExit, } from \"./ helpers.js\"; 证据：`tests/server.test.ts`\n- **Snapshots.Test**（source_file）：import { test, before, after } from \"node:test\"; import assert from \"node:assert/strict\"; import { readFile, readdir, writeFile } from \"node:fs/promises\"; import { dirname, join } from \"node:path\"; import { fileURLToPath } from \"node:url\"; import { spawnClient, startMock } from \"./ helpers.js\"; 证据：`tests/snapshots.test.ts`\n\n## 宿主 AI 必须遵守的规则\n\n- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `package.json`, `LICENSE`\n- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `package.json`, `LICENSE`\n\n## 用户开工前应该回答的问题\n\n- 你准备在哪个宿主 AI 或本地环境中使用它？\n- 你只是想先体验工作流，还是准备真实安装？\n- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？\n\n## 验收标准\n\n- 所有能力声明都能回指到 evidence_refs 中的文件路径。\n- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。\n- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。\n\n---\n\n## Doramagic Context Augmentation\n\n下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。\n\n## Human Manual 骨架\n\n使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。\n\n宿主 AI 硬性规则：\n- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。\n- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。\n- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。\n\n- **Introduction**：importance `high`\n  - source_paths: README.md, src/index.ts\n- **Quick Start Guide**：importance `high`\n  - source_paths: README.md, package.json\n- **Processing Pipeline**：importance `high`\n  - source_paths: src/core.ts, package.json\n- **HTTP/2 Fingerprinting**：importance `high`\n  - source_paths: src/core.ts\n- **CLI Usage**：importance `high`\n  - source_paths: src/cli.ts, src/index.ts\n- **MCP Server Integration**：importance `high`\n  - source_paths: src/mcp.ts, src/index.ts, .mcp.json\n- **Environment Variables**：importance `high`\n  - source_paths: src/core.ts, src/sandbox.ts\n- **Write Sandbox Security**：importance `high`\n  - source_paths: src/sandbox.ts\n\n## Repo Inspection Evidence / 源码检查证据\n\n- repo_clone_verified: true\n- repo_inspection_verified: true\n- repo_commit: `bab725135ec30a217db6f34618e5e27772cee1e7`\n- inspected_files: `package.json`, `README.md`, `docs/SPEC.md`, `src/index.ts`, `src/mcp.ts`, `src/cli.ts`, `src/sandbox.ts`, `src/core.ts`\n\n宿主 AI 硬性规则：\n- 没有 repo_clone_verified=true 时，不得声称已经读过源码。\n- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。\n- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。\n\n## Doramagic Pitfall Constraints / 踩坑约束\n\n这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。\n\n### Constraint 1: 来源证据：v0.4.1\n\n- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v0.4.1\n- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。\n- Why it matters: 可能增加新用户试用和生产接入成本。\n- Evidence: community_evidence:github | cevd_749b65614f7b40e0b524f4e932cd4aca | https://github.com/vasylenko/markfetch/releases/tag/v0.4.1 | 来源讨论提到 node 相关条件，需在安装/试用前复核。\n- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。\n\n### Constraint 2: 能力判断依赖假设\n\n- Trigger: README/documentation is current enough for a first validation pass.\n- Host AI rule: 将假设转成下游验证清单。\n- Why it matters: 假设不成立时，用户拿不到承诺的能力。\n- Evidence: capability.assumptions | github_repo:1234238440 | https://github.com/vasylenko/markfetch | README/documentation is current enough for a first validation pass.\n- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。\n\n### Constraint 3: 维护活跃度未知\n\n- Trigger: 未记录 last_activity_observed。\n- Host AI rule: 补 GitHub 最近 commit、release、issue/PR 响应信号。\n- Why it matters: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。\n- Evidence: evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | last_activity_observed missing\n- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。\n\n### Constraint 4: 下游验证发现风险项\n\n- Trigger: no_demo\n- Host AI rule: 进入安全/权限治理复核队列。\n- Why it matters: 下游已经要求复核，不能在页面中弱化。\n- Evidence: downstream_validation.risk_items | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium\n- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。\n\n### Constraint 5: 存在评分风险\n\n- Trigger: no_demo\n- Host AI rule: 把风险写入边界卡，并确认是否需要人工复核。\n- Why it matters: 风险会影响是否适合普通用户安装。\n- Evidence: risks.scoring_risks | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium\n- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。\n\n### Constraint 6: issue/PR 响应质量未知\n\n- Trigger: issue_or_pr_quality=unknown。\n- Host AI rule: 抽样最近 issue/PR，判断是否长期无人处理。\n- Why it matters: 用户无法判断遇到问题后是否有人维护。\n- Evidence: evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | issue_or_pr_quality=unknown\n- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。\n\n### Constraint 7: 发布节奏不明确\n\n- Trigger: release_recency=unknown。\n- Host AI rule: 确认最近 release/tag 和 README 安装命令是否一致。\n- Why it matters: 安装命令和文档可能落后于代码，用户踩坑概率升高。\n- Evidence: evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | release_recency=unknown\n- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。\n",
      "summary": "给宿主 AI 的上下文和工作边界。",
      "title": "AI Context Pack / 带给我的 AI"
    },
    "boundary_risk_card": {
      "asset_id": "boundary_risk_card",
      "filename": "BOUNDARY_RISK_CARD.md",
      "markdown": "# Boundary & Risk Card / 安装前决策卡\n\n项目：vasylenko/markfetch\n\n## Doramagic 试用结论\n\n当前结论：可以进入发布前推荐检查；首次使用仍应从最小权限、临时目录和可回滚配置开始。\n\n## 用户现在可以做\n\n- 可以先阅读 Human Manual，理解项目目的和主要工作流。\n- 可以复制 Prompt Preview 做安装前体验；这只验证交互感，不代表真实运行。\n- 可以把官方 Quick Start 命令放到隔离环境中验证，不要直接进主力环境。\n\n## 现在不要做\n\n- 不要把 Prompt Preview 当成项目实际运行结果。\n- 不要把 metadata-only validation 当成沙箱安装验证。\n- 不要把未验证能力写成“已支持、已跑通、可放心安装”。\n- 不要在首次试用时交出生产数据、私人文件、真实密钥或主力配置目录。\n\n## 安装前检查\n\n- 宿主 AI 是否匹配：mcp_host\n- 官方安装入口状态：已发现官方入口\n- 是否在临时目录、临时宿主或容器中验证：必须是\n- 是否能回滚配置改动：必须能\n- 是否需要 API Key、网络访问、读写文件或修改宿主配置：未确认前按高风险处理\n- 是否记录了安装命令、实际输出和失败日志：必须记录\n\n## 当前阻塞项\n\n- review_required: community_discussion_evidence_below_public_threshold\n\n## 项目专属踩坑\n\n- 来源证据：v0.4.1（medium）：可能增加新用户试用和生产接入成本。 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。\n- 能力判断依赖假设（medium）：假设不成立时，用户拿不到承诺的能力。 建议检查：将假设转成下游验证清单。\n- 维护活跃度未知（medium）：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。\n- 下游验证发现风险项（medium）：下游已经要求复核，不能在页面中弱化。 建议检查：进入安全/权限治理复核队列。\n- 存在评分风险（medium）：风险会影响是否适合普通用户安装。 建议检查：把风险写入边界卡，并确认是否需要人工复核。\n\n## 风险与权限提示\n\n- no_demo: medium\n\n## 证据缺口\n\n- 暂未发现结构化证据缺口。\n",
      "summary": "安装、权限、验证和推荐前风险。",
      "title": "Boundary & Risk Card / 边界与风险卡"
    },
    "human_manual": {
      "asset_id": "human_manual",
      "filename": "HUMAN_MANUAL.md",
      "markdown": "# https://github.com/vasylenko/markfetch 项目说明书\n\n生成时间：2026-05-15 08:07:16 UTC\n\n## 目录\n\n- [Introduction](#introduction)\n- [Quick Start Guide](#quickstart)\n- [Processing Pipeline](#processing-pipeline)\n- [HTTP/2 Fingerprinting](#http-fingerprinting)\n- [CLI Usage](#cli-usage)\n- [MCP Server Integration](#mcp-server)\n- [Environment Variables](#environment-variables)\n- [Write Sandbox Security](#write-sandbox)\n- [Error Handling](#error-handling)\n- [Development Guide](#development)\n\n<a id='introduction'></a>\n\n## Introduction\n\n### 相关页面\n\n相关主题：[Quick Start Guide](#quickstart), [Processing Pipeline](#processing-pipeline)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n</details>\n\n# Introduction\n\n## What is markfetch?\n\n**markfetch** is a Node.js tool that fetches public HTTP/S URLs and returns clean, readable markdown — indistinguishable from what a human would get by running \"Save as Markdown\" in a browser. It is designed to provide high-quality content extraction for language models, with a focus on producing output that LLM clients can actually consume reliably.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Core Design Philosophy\n\nmarkfetch is built around several key principles that differentiate it from generic fetching solutions:\n\n| Principle | Description |\n|-----------|-------------|\n| **Single-channel output** | Returns markdown in `content[0].text` only — no `structuredContent` that some LLM clients drop |\n| **Real-browser fingerprint** | Uses HTTP/2 transport with a coherent Chrome header set and `Sec-CH-UA-*` client hints |\n| **Reader-View extraction** | Leverages Mozilla's Readability library to extract the main article content |\n| **Zero-config defaults** | Works out of the box with sensible defaults |\n| **Deterministic errors** | 8 structured error codes for reliable error handling |\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Architecture Overview\n\nmarkfetch follows an adapter pattern with a unified core:\n\n```mermaid\ngraph TD\n    A[User / LLM Client] --> B[Adapter Layer]\n    B --> C{Invocation Mode}\n    C -->|CLI args| D[cli.ts]\n    C -->|MCP stdio| E[mcp.ts]\n    D --> F[core.ts - fetchMarkdown]\n    E --> F\n    F --> G[HTTP Fetch - undici]\n    G --> H[Readability Extraction]\n    H --> I[Turndown Conversion]\n    I --> J[Markdown Output]\n```\n\n### Core Components\n\n| Component | File | Responsibility |\n|-----------|------|----------------|\n| **Core Pipeline** | `src/core.ts` | URL fetching, HTML parsing, content extraction, markdown conversion, error throwing |\n| **CLI Adapter** | `src/cli.ts` | Command-line argument parsing, stdout/stderr output |\n| **MCP Adapter** | `src/mcp.ts` | Model Context Protocol stdio server, tool registration |\n| **Write Sandbox** | `src/sandbox.ts` | Path validation for file saves |\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts), [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts), [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n## Two Operating Modes\n\n### CLI Mode\n\nThe command-line interface accepts a URL and outputs markdown to stdout:\n\n```bash\nmarkfetch https://en.wikipedia.org/wiki/Markdown\n```\n\nOptions include:\n- `-o, --output <path>` — Save markdown to a file\n- `-V, --version` — Print version\n- `-h, --help` — Print usage\n\nThe CLI respects the same environment variables as the MCP mode and resolves relative output paths against the current working directory.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md), [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n### MCP Mode\n\nThe Model Context Protocol server provides a single tool `fetch_markdown(url, savePath?)` for integration with LLM clients like Claude Code, Cursor, or Goose:\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"]\n    }\n  }\n}\n```\n\nThe MCP mode has additional security features:\n- **Write sandbox**: File saves are restricted to allowed write roots\n- **Lazy loading**: The CLI adapter is never loaded in MCP mode, ensuring `console.log` is never reachable\n\n资料来源：[src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts), [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Content Extraction Pipeline\n\nThe markdown conversion process involves several stages:\n\n```mermaid\ngraph LR\n    A[HTML Response] --> B[Decode Encoded Tags]\n    B --> C[Ensure Base Href]\n    C --> D[Rewrite for Readability]\n    D --> E[Readability Parse]\n    E --> F[Turndown Convert]\n    F --> G[Prune Empty Headings]\n    G --> H[Clean Markdown]\n```\n\n### Extraction Details\n\n1. **Encoded Tag Decoding**: Handles HTML entities like `&lt;code&gt;` in code blocks\n2. **Base Href Injection**: Ensures relative URLs become absolute using the canonical URL\n3. **Pre-processing Rewrites**: Handles footnotes, `<details>` elements, and MediaWiki-specific structures\n4. **Readability Parsing**: Extracts main article content using Mozilla Readability with `keepClasses: true` to preserve language hints on code blocks\n5. **Markdown Conversion**: Uses Turndown with a custom escape function to avoid noisy backslash escapes\n6. **Heading Pruning**: Removes empty headings left by stripped interactive widgets\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n## Error Handling\n\nmarkfetch provides 8 deterministic error codes:\n\n| Error Code | Meaning |\n|------------|---------|\n| `network_error` | DNS, TCP, TLS failure, or unexpected fetcher error |\n| `http_error` | Non-2xx status from upstream |\n| `timeout` | Request exceeded `MARKFETCH_TIMEOUT_MS` |\n| `unsupported_content_type` | Response is not `text/html` or `application/xhtml+xml` |\n| `extraction_failed` | Readability found no article content (typical for SPAs) |\n| `too_large` | Content exceeded `MARKFETCH_MAX_BYTES` |\n| `save_failed` | File write failed (permission, missing directory) |\n| `save_forbidden` | `savePath` resolves outside allowed write roots |\n\nAll errors are thrown uniformly from `core.ts` as `MarkfetchError` and caught by adapters for translation to their respective output formats.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Configuration\n\n| Variable | Default | Purpose |\n|----------|---------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in milliseconds |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Browser fingerprint; must be Chrome UA |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | `os.tmpdir()` + `process.cwd()` | MCP-only; allowed file save paths |\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## What markfetch Is Not\n\nUnderstanding the boundaries helps set correct expectations:\n\n| Limitation | Explanation |\n|------------|-------------|\n| **Not a crawler** | One URL in, one document out. No recursion, `robots.txt` parsing, or rate limiting. |\n| **Not authenticated** | Anonymous fetch only. Pages behind login walls return public content or `http_error`. |\n| **Not a JS renderer** | Pure client-rendered SPAs with no static HTML return `extraction_failed`. SPAs with server-rendered content will extract what they ship. |\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Requirements\n\n- **Node.js ≥ 24**\n- **npm** for installation\n\n## Quick Start\n\n```bash\n# Install globally\nnpm i -g markfetch\n\n# Fetch a URL\nmarkfetch https://en.wikipedia.org/wiki/Markdown\n\n# Save to file\nmarkfetch https://example.com/article -o output.md\n```\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Version History\n\n| Version | Date | Key Changes |\n|---------|------|-------------|\n| 0.6.0 | 2026-05-13 | Write sandbox, `save_forbidden` error, CI matrix expansion |\n| 0.5.0 | 2026-05-12 | CLI mode with lazy-loading dispatcher |\n| 0.4.0 | 2026-05-10 | MCP server with single `fetch_markdown` tool |\n| 0.4.1 | 2026-05-11 | Bug fixes and documentation improvements |\n\n资料来源：[CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n\n---\n\n<a id='quickstart'></a>\n\n## Quick Start Guide\n\n### 相关页面\n\n相关主题：[Introduction](#introduction), [CLI Usage](#cli-usage), [MCP Server Integration](#mcp-server)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n</details>\n\n# Quick Start Guide\n\nmarkfetch is a tool that fetches URLs and returns clean markdown output. It operates as both a CLI command and an MCP (Model Context Protocol) server, making it suitable for AI agents like Claude Code, Codex, and Gemini CLI.\n\n## Installation\n\n### Prerequisites\n\n- Node.js ≥ 24 资料来源：[package.json:8]()\n\n### CLI Installation (Global)\n\n```bash\nnpm i -g markfetch\n```\n\nAfter installation, the `markfetch` command is available globally. 资料来源：[README.md:38]()\n\n### CLI Installation (npx)\n\nFor one-off usage without global installation:\n\n```bash\nnpx -y markfetch <url>\n```\n\n### MCP Server Setup\n\nAdd markfetch to your MCP client configuration. The setup varies by client.\n\n#### Claude Code\n\n```bash\nclaude mcp add --scope user markfetch -- npx -y markfetch\n```\n\n#### Codex\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"]\n    }\n  }\n}\n```\n\n#### Gemini CLI\n\n```bash\ngemini mcp add -s user markfetch npx -y markfetch\n```\n\n#### Cursor / Goose / Other stdio-MCP Clients\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"]\n    }\n  }\n}\n```\n\n资料来源：[README.md:46-69]()\n\n## CLI Usage\n\n### Basic Fetch\n\n```bash\nmarkfetch <url>\n```\n\nThe fetched markdown is printed to stdout. 资料来源：[src/cli.ts:18]()\n\n### Save to File\n\n```bash\nmarkfetch <url> -o <path>\n```\n\nUse `-o` or `--output` to save markdown to a file. Relative paths resolve against the current working directory. 资料来源：[src/cli.ts:12-15]()\n\nExample:\n\n```bash\nmarkfetch https://en.wikipedia.org/wiki/Markdown -o output.md\n```\n\n### Help and Version\n\n```bash\nmarkfetch --help\nmarkfetch --version\n```\n\n## MCP Tool Usage\n\n### Tool Name\n\n`fetch_markdown`\n\n### Parameters\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `url` | string | Yes | Absolute http(s) URL to fetch. The server follows redirects automatically. No authentication headers, cookies, or session state are sent. |\n| `savePath` | string | No | Absolute filesystem path. When provided, the fetched markdown is written to this path instead of returned in the response. |\n\n资料来源：[src/mcp.ts:22-33]()\n\n### Return Value\n\nThe tool returns markdown content in `content[0].text`. No `structuredContent` field is used — this ensures compatibility with MCP clients that forward only `structuredContent` to the model. 资料来源：[README.md:18-21]()\n\n## Environment Configuration\n\n| Variable | Default | Purpose |\n|----------|---------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in milliseconds |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown (5MB) |\n| `MARKFETCH_USER_AGENT` | Pinned Chrome 130 string | Override the User-Agent header. Must be a Chrome UA string. |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | `os.tmpdir()` + `process.cwd()` | MCP-only. Colon-delimited (POSIX) or semicolon-delimited (Windows) list of absolute paths permitted for `savePath` writes. |\n\n资料来源：[README.md:99-103]()\n\n### Passing Environment Variables to MCP\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_TIMEOUT_MS\": \"60000\"\n      }\n    }\n  }\n}\n```\n\n## Error Handling\n\nErrors are returned with deterministic codes in the format `[code] message`:\n\n| Code | Meaning |\n|------|---------|\n| `network_error` | DNS, TCP, or TLS failure |\n| `http_error` | Upstream returned a non-2xx status |\n| `timeout` | Request exceeded `MARKFETCH_TIMEOUT_MS` |\n| `unsupported_content_type` | Response was not `text/html` or `application/xhtml+xml` |\n| `extraction_failed` | Readability found no article content (typical for pure client-rendered SPAs) |\n| `too_large` | Response body or extracted markdown exceeded `MARKFETCH_MAX_BYTES` |\n| `save_failed` | `writeFile` failed (missing directory, permission denied) |\n| `save_forbidden` | `savePath` resolves outside the allowed write roots |\n\nErrors go to stderr with non-zero exit status in CLI mode. 资料来源：[README.md:72-85]()\n\n## Quick Workflow\n\n```mermaid\ngraph TD\n    A[Start markfetch] --> B{Arguments provided?}\n    B -->|Yes, URL argument| C[CLI Mode]\n    B -->|No arguments| D[MCP Server Mode]\n    C --> E[Fetch URL]\n    D --> F[Wait for MCP request]\n    E --> G{Output path specified?}\n    F --> H[Receive fetch_markdown request]\n    G -->|No| I[Print to stdout]\n    G -->|Yes, -o path| J[Write to file]\n    H --> I\n    J --> K[Return confirmation]\n    I --> L[Return markdown content]\n    K --> L\n```\n\n## Use Cases\n\n| Use Case | Recommended Mode | Command/Config |\n|----------|-------------------|----------------|\n| One-time URL fetch in shell | CLI | `markfetch <url>` |\n| Batch processing with shell scripts | CLI + `-o` | `markfetch <url> -o out.md` |\n| AI agent web content retrieval | MCP | Configure in client |\n| Large document bypass inline limits | MCP + `savePath` | Set `savePath` to local file |\n\n## Limitations\n\n- **Not a crawler**: No recursion, no `robots.txt` parsing. One URL in, one document out. 资料来源：[README.md:89-91]()\n- **Not authenticated**: Anonymous fetch only. Pages behind login walls return whatever the public response is. 资料来源：[README.md:93-95]()\n- **Not a JS renderer**: Pure client-rendered SPAs with no static HTML return `extraction_failed`. 资料来源：[README.md:97-99]()\n\n---\n\n<a id='processing-pipeline'></a>\n\n## Processing Pipeline\n\n### 相关页面\n\n相关主题：[Introduction](#introduction), [HTTP/2 Fingerprinting](#http-fingerprinting), [Error Handling](#error-handling)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n</details>\n\n# Processing Pipeline\n\n## Overview\n\nThe Processing Pipeline is the core data flow engine in markfetch. It transforms raw HTML fetched from a URL into clean, readable markdown suitable for consumption by AI agents and language models. The pipeline is intentionally single-purpose — one URL in, one markdown document out — with no recursion, pagination, or client-side JavaScript rendering.\n\nThe pipeline operates identically whether invoked via CLI or MCP adapter, ensuring consistent behavior across both interfaces.\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n## Architecture\n\nThe pipeline is composed of three primary stages executed sequentially:\n\n```mermaid\ngraph TD\n    A[URL Input] --> B[HTTP Fetch]\n    B --> C{HTML Valid?}\n    C -->|No| D[Error: network_error / http_error / timeout]\n    C -->|Yes| E[Content-Type Check]\n    E -->|Non-HTML| F[Error: unsupported_content_type]\n    E -->|HTML| G[Extract Article]\n    G -->|No Content| H[Error: extraction_failed]\n    G -->|Extracted| I[Convert to Markdown]\n    I --> J{Size Check}\n    J -->|Exceeds Limit| K[Error: too_large]\n    J -->|Valid| L{Save Path?}\n    L -->|Yes| M[Write to File / Error: save_forbidden / save_failed]\n    L -->|No| N[Return Markdown]\n```\n\nEach stage performs validation and may abort with a deterministic error code, ensuring failures are predictable and actionable.\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n## Stage 1: HTTP Fetch\n\nThe fetch stage retrieves raw HTML from the target URL using Node.js `fetch` with a real-browser fingerprint.\n\n### Transport Configuration\n\n| Setting | Value | Purpose |\n|---------|-------|---------|\n| Protocol | HTTP/2 | Modern web fingerprint |\n| User-Agent | Chrome 130 (pinned) | Realistic browser identification |\n| Client Hints | Sec-CH-UA-* headers | Derived from User-Agent at startup |\n| Timeout | `MARKFETCH_TIMEOUT_MS` (default: 30000ms) | Per-request budget |\n\nThe User-Agent string is validated at startup. Non-Chrome strings fail fast to prevent fingerprint inconsistencies that could trigger bot detection.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### Error Conditions\n\n| Code | Trigger |\n|------|---------|\n| `network_error` | DNS failure, TCP failure, TLS error, unexpected fetcher error |\n| `http_error` | Non-2xx HTTP status code |\n| `timeout` | Response exceeds `MARKFETCH_TIMEOUT_MS` |\n\nRedirects are followed automatically by the underlying HTTP client.\n\n## Stage 2: Article Extraction\n\nArticle extraction identifies and isolates the main content from the fetched HTML, stripping navigation, sidebars, footers, and other boilerplate.\n\n### Technology Stack\n\n| Component | Library | Purpose |\n|-----------|---------|---------|\n| HTML Parser | `linkedom` | Parses HTML into a DOM-like structure |\n| Extraction | `readability` (Mozilla) | Identifies main article content |\n| Configuration | `keepClasses: true` | Preserves code block language hints |\n\nThe `linkedom` parser is chosen over native `DOMParser` to ensure consistent behavior across Node.js versions and environments.\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n### Pre-Extraction Rewrites\n\nBefore Readability processes the document, the pipeline applies targeted HTML rewrites to normalize content and improve extraction quality:\n\n```typescript\nfunction rewriteForReadability(document: Document): void {\n  // Normalize code blocks (pre and code elements)\n  // Convert aside elements to sections\n  // Expand details/summary elements\n  // Flatten MediaWiki heading wrappers\n}\n```\n\nSpecific transformations include:\n\n| Transform | Target | Action |\n|-----------|--------|--------|\n| Code block normalization | `<pre>`, `<code>` | Standardize encoding artifacts |\n| Base href injection | `<head>` / `<html>` | Ensure absolute URLs after redirects |\n| Aside conversion | `<aside>` with footnote roles | Convert to `<section>` |\n| Details expansion | `<details>`, `<summary>` | Inline content |\n| Heading unwrapping | `div.mw-heading` | Remove MediaWiki wrappers |\n\n### Base Href Handling\n\nReadability and linkedom leave relative URLs unresolved unless a `<base>` element exists. The pipeline injects the post-redirect canonical URL to ensure all hrefs and srcs resolve correctly:\n\n```typescript\nfunction ensureBaseHref(html: string, url: string): string {\n  const safeUrl = url.replaceAll(\"&\", \"&amp;\").replaceAll('\"', \"&quot;\");\n  const stripped = html.replaceAll(/<base\\s[^>]*>/gi, \"\");\n  // Inject <base href=\"...\"> into <head> or <html>\n}\n```\n\n### Error Conditions\n\n| Code | Trigger |\n|------|---------|\n| `unsupported_content_type` | Response is not `text/html` or `application/xhtml+xml` |\n| `extraction_failed` | Readability returned empty content (typical for client-rendered SPAs) |\n\n资料来源：[src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n\n## Stage 3: Markdown Conversion\n\nThe conversion stage transforms extracted HTML into clean markdown using Turndown with custom rules.\n\n### Technology Stack\n\n| Component | Library | Notes |\n|-----------|---------|-------|\n| HTML-to-MD | `turndown` | Configured with GFM rules |\n| Code fences | Custom rule | Preserves `class=\"language-X\"` as hint |\n\n### Custom Escape Behavior\n\nTurndown's default escape mechanism inserts backslashes before certain character sequences that might be misinterpreted as markdown. The pipeline removes two categories of unnecessary escapes:\n\n| Pattern | Before | After | Rationale |\n|---------|--------|-------|-----------|\n| Intraword underscores | `\\_` | `_` | Intraword underscores are valid |\n| Mid-line dash/equals | `\\-X`, `\\=X` | `-X`, `=X` | Not list markers or underlines when alphanumeric follows |\n\nThis prevents the output from containing visible escape characters that don't affect rendering.\n\n### Empty Heading Pruning\n\nThe conversion includes iterative pruning of empty headings — headings immediately followed by another heading with no body content. This commonly occurs when Readability strips interactive widgets (browser-compat tables, spec diagrams) but leaves the surrounding heading structure.\n\n### Title Handling\n\n| Condition | Output |\n|-----------|--------|\n| Content starts with `<h1>` | Use content heading, no duplicate |\n| Content lacks heading | Prepend `# {title}` from Readability |\n\n### Output Format\n\n```markdown\n# Page Title (if not already in content)\n\nArticle body with clean markdown conversion...\n```\n\n## Stage 4: Size Validation and Output\n\n### Size Limits\n\n| Limit | Environment Variable | Default |\n|-------|---------------------|---------|\n| Response body | `MARKFETCH_MAX_BYTES` | 5,000,000 bytes |\n| Extracted markdown | Same variable | Same default |\n\nThe pipeline checks both the raw HTTP response size and the final markdown size against this cap.\n\n### Error Conditions\n\n| Code | Trigger |\n|------|---------|\n| `too_large` | Body or markdown exceeds `MARKFETCH_MAX_BYTES` |\n\n### Output Routing\n\n| Mode | Destination | Behavior |\n|------|-------------|----------|\n| No `savePath` | Return value | `markdown` field contains content |\n| `savePath` (MCP) | File system | `savedTo` field contains path |\n| `savePath` (CLI) | File system | Confirmation to stdout |\n\n## Write Sandbox (MCP Only)\n\nWhen used as an MCP tool with a `savePath` parameter, writes are confined to an allowed set of root directories.\n\n### Default Roots\n\n| Platform | Roots |\n|----------|-------|\n| POSIX | `os.tmpdir()`, `process.cwd()` |\n| Windows | Same, case-insensitive comparison |\n\n### Configuration\n\n`MARKFETFETCH_ALLOWED_WRITE_ROOTS` overrides the defaults entirely. Paths use platform delimiters:\n\n| Platform | Delimiter | Example |\n|----------|-----------|---------|\n| POSIX | `:` | `/Users/me/out:/tmp` |\n| Windows | `;` | `C:\\Users\\me\\out;C:\\Temp` |\n\n### Error Conditions\n\n| Code | Trigger |\n|------|---------|\n| `save_forbidden` | `savePath` resolves outside allowed roots |\n| `save_failed` | `writeFile` failed (permissions, missing directory) |\n\nThe sandbox applies only to MCP mode. The CLI has no restrictions — the human at the shell is the security boundary.\n\n资料来源：[src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n\n## Error Codes Reference\n\nThe pipeline returns exactly eight deterministic error codes:\n\n| Code | Stage | Description |\n|------|-------|-------------|\n| `network_error` | Fetch | DNS/TCP/TLS failure |\n| `http_error` | Fetch | Non-2xx status |\n| `timeout` | Fetch | Exceeded timeout budget |\n| `unsupported_content_type` | Fetch | Not HTML/XHTML |\n| `extraction_failed` | Extract | Readability found no content |\n| `too_large` | Convert/Validate | Exceeded size cap |\n| `save_forbidden` | Output | Path outside sandbox |\n| `save_failed` | Output | File write failed |\n\nAll errors use the format `[code] message` for easy parsing by consuming tools.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Data Flow Summary\n\n```mermaid\ngraph LR\n    A[URL] --> B[HTTP Fetch]\n    B --> C{HTML?}\n    C -->|Yes| D[Readability]\n    C -->|No| E[Error]\n    D --> F[HTML Rewrites]\n    F --> G[Extract Content]\n    G --> H[Turndown]\n    H --> I[Size Check]\n    I -->|OK| J[Output]\n    I -->|Large| K[Error]\n    J --> L{savePath?}\n    L -->|No| M[Return Markdown]\n    L -->|Yes| N[Write File]\n```\n\n## Pipeline Entry Points\n\n### CLI Adapter\n\nThe CLI adapter (`src/cli.ts`) parses arguments and delegates to the core pipeline:\n\n```typescript\nconst { markdown, bytes, savedTo } = await fetchMarkdown({\n  url,\n  savePath: resolve(process.cwd(), options.output)\n});\n```\n\nOutput behavior:\n- With `-o`: prints `Saved N bytes to <path>` to stdout\n- Without `-o`: writes raw markdown to stdout via `process.stdout.write`\n\nErrors print to stderr with `[code] message` format.\n\n### MCP Adapter\n\nThe MCP adapter (`src/mcp.ts`) registers the `fetch_markdown` tool and calls the core pipeline:\n\n```typescript\nserver.registerTool(\"fetch_markdown\", {\n  description: \"Fetch a single public HTTP/S URL...\",\n  inputSchema: {\n    url: z.string().url(),\n    savePath: z.string().refine(isAbsolute).optional()\n  }\n});\n```\n\nOutput is always returned via `content[0].text`, never `structuredContent`, ensuring compatibility with clients that only forward `content[]`.\n\n资料来源：[src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n资料来源：[src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n## Configuration Options\n\n| Variable | Default | Applies To | Purpose |\n|----------|---------|------------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Both | Per-request timeout |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Both | Size cap |\n| `MARKFETCH_USER_AGENT` | Chrome 130 | Both | Browser fingerprint |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | tmpdir + cwd | MCP only | Write sandbox roots |\n\nAll variables are validated at startup with fail-fast behavior — invalid values terminate the process immediately with a stderr message.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Dependencies\n\n| Package | Version | Role |\n|---------|---------|------|\n| `linkedom` | runtime | HTML parsing |\n| `readability` | runtime | Article extraction |\n| `turndown` | runtime | HTML-to-markdown |\n| `turndown-plugin-gfm` | runtime | GitHub Flavored Markdown |\n| `commander` | runtime | CLI argument parsing |\n| `@modelcontextprotocol/sdk` | runtime | MCP server framework |\n\nNode.js ≥ 24 is required for native `fetch` and `fetch` headers support.\n\n资料来源：[package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n\n---\n\n<a id='http-fingerprinting'></a>\n\n## HTTP/2 Fingerprinting\n\n### 相关页面\n\n相关主题：[Processing Pipeline](#processing-pipeline), [Environment Variables](#environment-variables)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n</details>\n\n# HTTP/2 Fingerprinting\n\n## Overview\n\nHTTP/2 Fingerprinting is a technique used by markfetch to mimic real browser traffic when fetching web pages. Instead of making requests that appear to come from a typical HTTP library (like curl or a basic fetch implementation), markfetch generates HTTP/2 requests with headers and client hints that closely match those of an actual Chrome browser session.\n\nThis approach serves two critical purposes:\n\n1. **Bypass anti-bot measures**: Many websites employ fingerprinting techniques to detect and block automated scrapers. By presenting headers identical to a genuine Chrome browser, markfetch avoids triggering these defenses.\n2. **Access SEO-rendered content**: Sites that serve different content to bots vs. browsers will return the full article content when markfetch requests arrive with Chrome-like fingerprints.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Architecture\n\n```mermaid\ngraph TD\n    A[URL Request] --> B{Adapter Type?}\n    B -->|MCP| C[src/mcp.ts]\n    B -->|CLI| D[src/cli.ts]\n    C --> E[src/core.ts - fetchMarkdown]\n    D --> E\n    E --> F[Undici Dispatcher]\n    F --> G[HTTP/2 Transport]\n    G --> H[Sec-CH-UA-* Client Hints]\n    G --> I[Chrome Headers]\n    H --> J[Upstream Server]\n    I --> J\n    J --> K[HTML Response]\n    K --> L[Readability Parser]\n    L --> M[Markdown Output]\n```\n\n## Implementation Details\n\n### User Agent String\n\nThe default user agent is a pinned Chrome 130 string. This can be overridden via the `MARKFETCH_USER_AGENT` environment variable, but must be a valid Chrome UA string.\n\n| Environment Variable | Default Value | Purpose |\n|---|---|---|\n| `MARKFETCH_USER_AGENT` | Pinned Chrome 130 string | Override the browser fingerprint UA |\n\n**Constraint**: The UA string must be a Chrome browser UA. Non-Chrome strings fail fast at startup because `Sec-CH-UA-*` client hints are derived from the UA at initialization time.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### Client Hints Generation\n\nWhen the server starts, markfetch parses the `MARKFETCH_USER_AGENT` value and derives `Sec-CH-UA-*` client hint headers from it. These hints are sent with every HTTP/2 request and include:\n\n- `Sec-CH-UA` — Browser brand and version\n- `Sec-CH-UA-Mobile` — Mobile indicator\n- `Sec-CH-UA-Platform` — Operating system\n\n```mermaid\ngraph LR\n    A[MARKFETCH_USER_AGENT<br/>Chrome 130] --> B[Startup<br/>Initialization]\n    B --> C[Sec-CH-UA Header<br/>Derived Value]\n    B --> D[Sec-CH-UA-Mobile<br/>Derived Value]\n    B --> E[Sec-CH-UA-Platform<br/>Derived Value]\n    C --> F[Every HTTP/2<br/>Request]\n    D --> F\n    E --> F\n```\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### HTTP/2 Transport\n\nMarkfetch uses the undici HTTP client library with HTTP/2 protocol support. The HTTP/2 transport is selected automatically by undici when the server supports it, enabling:\n\n- Multiplexed requests over a single connection\n- Header compression\n- Server push capabilities\n\nThe combination of HTTP/2 transport + coherent Chrome header set creates a fingerprint that is indistinguishable from a human browsing with Chrome DevTools open.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### Request Flow\n\n```mermaid\nsequenceDiagram\n    participant Client\n    participant Markfetch\n    participant Undici\n    participant Server\n\n    Client->>Markfetch: fetch_markdown(url)\n    Markfetch->>Markfetch: Validate MARKFETCH_USER_AGENT\n    Markfetch->>Undici: Dispatch with Chrome headers\n    Undici->>Server: HTTP/2 CONNECT<br/>Sec-CH-UA: \"Chromium\"\n    Undici->>Server: Sec-CH-UA-Mobile: ?U\n    Undici->>Server: Sec-CH-UA-Platform: \"Windows\"\n    Undici->>Server: GET /path HTTP/2\n    Server->>Undici: HTTP/2 200 OK<br/>text/html\n    Undici->>Markfetch: HTML Content\n    Markfetch->>Markfetch: Apply Readability\n    Markfetch->>Markfetch: Convert to Markdown\n    Markfetch->>Client: Clean Markdown\n```\n\n## Configuration\n\n### Environment Variables\n\n| Variable | Default | Purpose |\n|---|---|---|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in milliseconds |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown |\n| `MARKFETCH_USER_AGENT` | Pinned Chrome 130 | Browser fingerprint override |\n\n### Validation\n\nAll environment variables are validated at startup. Invalid values cause the process to fail fast on stderr with descriptive error messages, rather than producing confusing per-request errors.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Integration Points\n\n### MCP Adapter\n\nThe MCP server (`src/mcp.ts`) uses the core fetch pipeline which includes the HTTP/2 fingerprinting. The tool description explicitly documents this behavior:\n\n> Fetch a single public HTTP/S URL and return its main article content as clean markdown. Best for articles, documentation, blog posts, news, and reference pages. Non-HTML responses return `unsupported_content_type`.\n\n资料来源：[src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n### CLI Adapter\n\nThe CLI adapter (`src/cli.ts`) also uses the same core fetch pipeline, ensuring consistent HTTP/2 fingerprinting behavior whether invoked via MCP or command line:\n\n```bash\nmarkfetch https://en.wikipedia.org/wiki/Markdown\n```\n\n资料来源：[src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Version History\n\n| Version | Date | Change |\n|---|---|---|\n| 0.4.0 | 2026-05-10 | HTTP/2 fingerprinting feature added with Sec-CH-UA-* client hints |\n| 0.5.0 | 2026-05-12 | CLI mode added with same fingerprinting behavior |\n| 0.6.0 | Current | Enhanced write sandbox and validation |\n\n资料来源：[CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n\n## Limitations\n\n### SPA Handling\n\nPure client-rendered Single Page Applications (SPAs) with no static HTML content return `extraction_failed`. Sites that ship server-rendered or SEO-prerendered HTML will extract whatever static content they expose, including when accessed with Chrome fingerprints.\n\n### Authentication\n\nMarkfetch performs anonymous fetches only — no cookie jar, no auth headers, no session reuse. Pages behind login walls return whatever the public response is, usually surfaced as `http_error`.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Security Considerations\n\nThe HTTP/2 fingerprinting approach makes requests appear legitimate, which raises responsibility concerns. The documentation explicitly states:\n\n> Use it on URLs whose targets you have permission to fetch, and respect the terms of service of any site you query. The maintainer assumes no liability for misuse.\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n---\n\n<a id='cli-usage'></a>\n\n## CLI Usage\n\n### 相关页面\n\n相关主题：[Quick Start Guide](#quickstart), [MCP Server Integration](#mcp-server), [Write Sandbox Security](#write-sandbox)\n\n<details>\n<summary>Relevant Source Files</summary>\n\nThe following source files were used to generate this page:\n\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/index.ts](https://github.com/vasylenko/markfetch/blob/main/src/index.ts)\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n</details>\n\n# CLI Usage\n\nThe markfetch CLI provides a command-line interface for fetching URLs and converting their content to clean markdown. It operates as one of two execution surfaces—the other being the MCP (Model Context Protocol) stdio server—with both sharing the same underlying core pipeline.\n\n## Overview\n\nThe CLI accepts a URL as its primary argument and outputs the converted markdown to stdout or to a specified file. It was introduced in version 0.5.0 as a way to make markfetch accessible from standard shell environments, pipelines, and scripts.\n\n| Aspect | Details |\n|--------|---------|\n| Entry Point | `markfetch <url>` |\n| Output | stdout (default) or file via `-o` |\n| Version | 0.6.0 |\n| Runtime | Node.js ≥ 24 |\n| Distribution | npm package |\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Architecture\n\nThe CLI is implemented as an adapter layer that delegates to the shared core. When the process is invoked with arguments, the dispatcher in `index.ts` lazy-loads the CLI adapter; bare invocation (zero arguments) routes to the MCP server instead.\n\n```mermaid\ngraph TD\n    A[\"markfetch CLI Invokation<br/>process.argv.length > 1\"] --> B[\"src/index.ts<br/>Dispatcher\"]\n    B --> C[\"src/cli.ts<br/>CLI Adapter\"]\n    C --> D[\"src/core.ts<br/>fetchMarkdown()\"]\n    D --> E[\"src/sandbox.ts<br/>Write Validation\"]\n    D --> F[\"HTTP Fetch + Readability + Turndown\"]\n    \n    G[\"Bare Invocation<br/>process.argv.length === 1\"] --> H[\"src/mcp.ts<br/>MCP Server\"]\n```\n\n资料来源：[src/cli.ts:39-47](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Command Syntax\n\n```bash\nmarkfetch <url> [options]\n```\n\n### Arguments\n\n| Argument | Required | Description |\n|----------|----------|-------------|\n| `<url>` | Yes | Absolute http(s) URL to fetch |\n\n### Options\n\n| Flag | Description |\n|------|-------------|\n| `-o, --output <path>` | Save markdown to a file (absolute or relative path). Default is stdout. |\n| `-V, --version` | Print version and exit |\n| `-h, --help` | Print usage and exit |\n\n资料来源：[src/cli.ts:23-30](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Output Behavior\n\nThe CLI maintains strict separation between its output channels:\n\n| Scenario | Channel | Content |\n|----------|---------|---------|\n| Raw markdown (no `-o`) | stdout | Raw markdown body via `process.stdout.write()` |\n| File output (`-o`) | stdout | Confirmation: `Saved N bytes to <path>` |\n| Any error | stderr | `[code] message` |\n\nThe raw markdown is written using `process.stdout.write()` rather than `console.log()` to preserve trailing whitespace in the output—matching the exact bytes the MCP adapter would emit in `content[0].text`.\n\n资料来源：[src/cli.ts:50-58](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Error Handling\n\nErrors are written to stderr with a deterministic format: `[code] message`. The process exits with a non-zero status code.\n\n```typescript\nprocess.exitCode = 1;\nconsole.error(`[${code}] ${message}`);\n```\n\nThe CLI uses `process.exitCode` (not `process.exit()`) to ensure pending output drains before the process exits—important when stdout is piped to a slow consumer.\n\n资料来源：[src/cli.ts:58-62](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n### Error Codes\n\n| Code | Meaning |\n|------|---------|\n| `network_error` | DNS / TCP / TLS failure |\n| `http_error` | Upstream returned a non-2xx status |\n| `timeout` | Request exceeded `MARKFETCH_TIMEOUT_MS` |\n| `unsupported_content_type` | Response was not HTML |\n| `extraction_failed` | No extractable article content |\n| `too_large` | Response or markdown exceeded `MARKFETCH_MAX_BYTES` |\n| `save_failed` | File write failed (permission denied, etc.) |\n\nNote: `save_forbidden` is MCP-only and does not apply to CLI (no sandbox).\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Path Resolution\n\nThe CLI resolves relative output paths against the current working directory before passing them to the core:\n\n```typescript\nconst savePath = options.output\n  ? resolve(process.cwd(), options.output)\n  : undefined;\n```\n\nTilde expansion is intentionally **not** performed—the shell expands `~/foo` before argv reaches the process, and a quoted literal `'~/foo'` should produce a file named `~/foo` in cwd (standard tool behavior).\n\n资料来源：[src/cli.ts:32-39](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Environment Variables\n\nThese environment variables apply to both CLI and MCP modes:\n\n| Variable | Default | Purpose |\n|----------|---------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in ms |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Override User-Agent header |\n\nThe CLI adapter imports `fetchMarkdown` and `classifyError` from the core module, which validates these environment variables at startup.\n\n资料来源：[src/cli.ts:15](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts) and [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## File Structure\n\nThe project source is organized into adapter modules:\n\n```\nsrc/\n├── index.ts    # Dispatcher (lazy-loads cli.ts or mcp.ts)\n├── core.ts     # Shared pipeline and errors\n├── cli.ts      # CLI adapter (commander-based)\n└── mcp.ts      # MCP stdio server adapter\n```\n\nThe lazy-import pattern ensures that `cli.ts` code (which calls `console.log`) is never loaded when running in MCP mode, preserving the \"stdout is reserved for MCP frames\" invariant structurally.\n\n资料来源：[CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md) and [src/cli.ts:1-13](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Installation\n\nInstall globally via npm:\n\n```bash\nnpm i -g markfetch\n```\n\nOr use via npx without installation:\n\n```bash\nnpx -y markfetch <url>\n```\n\nThe `bin` entry in `package.json` points to `dist/index.js`:\n\n```json\n{\n  \"bin\": {\n    \"markfetch\": \"dist/index.js\"\n  }\n}\n```\n\n资料来源：[package.json:16-18](https://github.com/vasylenko/markfetch/blob/main/package.json)\n\n## Usage Examples\n\n### Basic fetch to stdout\n\n```bash\nmarkfetch https://en.wikipedia.org/wiki/Markdown\n```\n\n### Save to file\n\n```bash\nmarkfetch https://example.com/article -o output.md\n```\n\n### With timeout override\n\n```bash\nMARKFETCH_TIMEOUT_MS=60000 markfetch https://slow-site.example.com\n```\n\n### Pipeline to another tool\n\n```bash\nmarkfetch https://example.com/doc | grep -A5 \"## Installation\"\n```\n\n资料来源：[README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n---\n\n<a id='mcp-server'></a>\n\n## MCP Server Integration\n\n### 相关页面\n\n相关主题：[Quick Start Guide](#quickstart), [CLI Usage](#cli-usage), [Write Sandbox Security](#write-sandbox)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/index.ts](https://github.com/vasylenko/markfetch/blob/main/src/index.ts)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n</details>\n\n# MCP Server Integration\n\n## Overview\n\nThe MCP (Model Context Protocol) Server Integration is the primary interface for AI agents to fetch web content as clean markdown. Markfetch exposes a single MCP tool `fetch_markdown` that accepts a URL and returns extracted markdown content, enabling language models like Claude to access web information through a standardized protocol.\n\nThe MCP server operates as a stdio-based server, meaning it communicates exclusively through standard input and standard output streams. This design ensures the server integrates seamlessly with MCP clients including Claude Desktop, Claude Code, Cursor, and Goose.\n\n## Architecture\n\n### Entry Point Dispatcher\n\nThe `src/index.ts` file implements an argv-discriminated dispatcher that determines whether to start the MCP server or the CLI based on the presence of command-line arguments:\n\n```typescript\nif (process.argv.length === 2) {\n  await import(\"./mcp.js\");\n} else {\n  await import(\"./cli.js\");\n}\n```\n\n**资料来源：[src/index.ts:26-29]()**\n\nWhen `process.argv.length === 2`, the process was invoked without arguments—this is the standard pattern MCP clients use when spawning a server. Any extra argument (URL, flags, `--help`) routes to the CLI adapter.\n\n### Module Isolation\n\nThe dynamic import pattern ensures complete module isolation:\n\n```mermaid\ngraph TD\n    A[markfetch entry] --> B{argv.length === 2?}\n    B -->|Yes| C[Lazy import: mcp.ts]\n    B -->|No| D[Lazy import: cli.ts]\n    C --> E[@modelcontextprotocol/sdk loaded]\n    D --> F[commander loaded]\n    E -.-> G[Never reaches console.log]\n    F -.-> H[Can use console.log]\n```\n\n**资料来源：[src/index.ts:18-22]()**\n\nThis architecture enforces the \"stdout is reserved for MCP frames\" invariant structurally—the MCP path never imports `cli.ts`, so code that calls `console.log` is literally unreachable from the MCP execution path.\n\n## MCP Server Implementation\n\n### Server Initialization\n\nThe MCP server is initialized using the `@modelcontextprotocol/sdk` package:\n\n```typescript\nconst server = new McpServer({ name: \"markfetch\", version: \"0.6.0\" });\n```\n\n**资料来源：[src/mcp.ts:20]()**\n\n### Tool Registration\n\nThe server registers a single tool `fetch_markdown` with a Zod-based input schema:\n\n```typescript\nserver.registerTool(\n  \"fetch_markdown\",\n  {\n    description: \"Fetch a single public HTTP/S URL and return its main article content as clean markdown...\",\n    inputSchema: {\n      url: z.string().url().describe(\"Absolute http(s) URL of the page to fetch...\"),\n      savePath: z.string().refine(isAbsolute, \"savePath must be an absolute filesystem path\").optional().describe(\"Optional. When provided...\")\n    }\n  },\n  async ({ url, savePath }) => {\n    // Implementation\n  }\n);\n```\n\n**资料来源：[src/mcp.ts:22-47]()**\n\n### Tool Input Schema\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `url` | string | Yes | Absolute http(s) URL of the page to fetch. The server follows redirects automatically. No authentication headers, cookies, or session state are sent. |\n| `savePath` | string | No | Optional absolute filesystem path. When provided, the fetched markdown is written to this path instead of returned inline. |\n\nThe `url` parameter is validated using Zod's `.url()` method to ensure a valid URL format. The `savePath` parameter must be an absolute path, enforced by the `.refine(isAbsolute, ...)` check.\n\n### Response Format\n\nThe tool returns a response in this structure:\n\n```typescript\n{\n  content: [{ type: \"text\", text: \"markdown content or [errorcode] message\" }],\n  isError: boolean\n}\n```\n\n**资料来源：[src/mcp.ts:8-12]()**\n\n## Error Handling\n\n### Error Code System\n\nThe MCP adapter uses a uniform error code system with 8 deterministic codes:\n\n| Error Code | Description | Source |\n|------------|-------------|--------|\n| `network_error` | DNS/TCP/TLS failure or unexpected internal error | core.ts |\n| `http_error` | Upstream returned non-2xx status | core.ts |\n| `timeout` | Per-request budget exceeded | core.ts |\n| `unsupported_content_type` | Response was not text/html or application/xhtml+xml | core.ts |\n| `extraction_failed` | Readability returned no article content | core.ts |\n| `too_large` | Response or markdown exceeded MARKFETCH_MAX_BYTES | core.ts |\n| `save_failed` | writeFile failed (permission denied, missing directory) | core.ts |\n| `save_forbidden` | savePath resolves outside allowed write roots | src/mcp.ts |\n\n### Error Result Factory\n\n```typescript\nfunction errorResult(code: ErrorCode, message: string) {\n  return {\n    content: [{ type: \"text\" as const, text: `[${code}] ${message}` }],\n    isError: true,\n  };\n}\n```\n\n**资料来源：[src/mcp.ts:8-12]()**\n\n### Error Propagation Pattern\n\nIn version 0.5.0, error handling was refactored so that core functions now `throw MarkfetchError` instead of returning error results inline. Both the MCP and CLI adapters catch these exceptions and convert them to their respective output formats.\n\n**资料来源：[CHANGELOG.md:19-21]()**\n\n## Write Sandbox (MCP-Specific)\n\nThe MCP server implements a write sandbox that restricts `savePath` operations to a set of allowed root directories.\n\n### Default Allowed Roots\n\nBy default, the allowed set is:\n- `os.tmpdir()` (system temp directory)\n- `process.cwd()` (current working directory)\n\nEach path is resolved via `fs.realpath` at startup to handle symlinks.\n\n### Configuration\n\nThe `MARKFETCH_ALLOWED_WRITE_ROOTS` environment variable overrides the default set entirely:\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"/Users/me/markfetch-out:/tmp\"\n      }\n    }\n  }\n}\n```\n\n**资料来源：[README.md:89-100]()**\n\n### Security Rationale\n\nThe sandbox is MCP-only by design. The CLI is unrestricted because \"a human at the shell is the security boundary.\" The asymmetry exists because the MCP tool is driven by a language model, which may be steered by content from a page it just fetched.\n\n**资料来源：[README.md:102-104]()**\n\n## Request Flow\n\n```mermaid\nsequenceDiagram\n    participant Client as MCP Client\n    participant MCP as MCP Server\n    participant Core as fetchMarkdown()\n    participant Fetch as HTTP Fetcher\n\n    Client->>MCP: fetch_markdown({url, savePath?})\n    MCP->>Core: fetchMarkdown({url, savePath})\n    Core->>Fetch: GET url (with Chrome fingerprint)\n    Fetch-->>Core: HTML response\n    Core->>Core: Readability parsing\n    Core->>Core: Turndown conversion\n    alt savePath provided\n        Core->>Core: Write to file (within sandbox)\n    end\n    Core-->>MCP: {markdown, bytes, savedTo?}\n    MCP-->>Client: {content: [{text: markdown}], isError: false}\n```\n\n## Environment Configuration\n\n| Variable | Default | Purpose | MCP-Specific |\n|----------|---------|---------|--------------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in ms | No |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown | No |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Override the User-Agent header | No |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | `os.tmpdir()` + `process.cwd()` | Permitted write roots for savePath | **Yes** |\n\n**资料来源：[src/mcp.ts:1-5](), [README.md:68-75]()**\n\n## Integration with Clients\n\n### Claude Desktop / Claude Code\n\n```bash\nclaude mcp add --scope user markfetch -- npx -y markfetch\n```\n\n**资料来源：[README.md:40-43]()**\n\n### Codex\n\n```bash\ncodex mcp add markfetch -- npx -y markfetch\n```\n\n**资料来源：[README.md:46-48]()**\n\n### Manual Configuration\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"]\n    }\n  }\n}\n```\n\n**资料来源：[README.md:52-58]()**\n\n## Dependencies\n\nThe MCP server depends on:\n\n| Package | Version | Purpose |\n|---------|---------|---------|\n| `@modelcontextprotocol/sdk` | ^1.29.0 | MCP protocol implementation |\n| `zod` | ^3.0.0 | Input schema validation |\n| `@mozilla/readability` | ^0.5.0 | Article extraction |\n| `turndown` | ^7.0.0 | HTML to Markdown conversion |\n| `undici` | ^8.2.0 | HTTP client |\n| `linkedom` | ^0.18.0 | DOM parsing |\n\n**资料来源：[package.json:36-47]()**\n\n---\n\n<a id='environment-variables'></a>\n\n## Environment Variables\n\n### 相关页面\n\n相关主题：[HTTP/2 Fingerprinting](#http-fingerprinting), [Write Sandbox Security](#write-sandbox), [Error Handling](#error-handling)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n</details>\n\n# Environment Variables\n\nmarkfetch uses environment variables to configure runtime behavior at startup. These variables control network timeouts, response size limits, HTTP fingerprinting, and file write permissions for the MCP server.\n\n## Overview\n\nEnvironment variables in markfetch serve as the primary configuration mechanism. Unlike per-request options, these settings apply globally to every operation and are validated once at process startup. This fail-fast design prevents misconfiguration from producing confusing per-request errors later.\n\n```mermaid\ngraph TD\n    A[Process Start] --> B[Validate MARKFETCH_TIMEOUT_MS]\n    A --> C[Validate MARKFETCH_MAX_BYTES]\n    A --> D[Validate MARKFETCH_USER_AGENT]\n    A --> E[Build MARKFETCH_ALLOWED_WRITE_ROOTS]\n    B --> F{Valid?}\n    C --> F\n    D --> F\n    E --> F\n    F -->|Yes| G[Server Ready]\n    F -->|No| H[Exit with stderr error]\n```\n\nAll validation occurs before the server begins accepting requests. Invalid values cause immediate process termination with a descriptive error message written to stderr.\n\n## Configuration Variables\n\n### MARKFETCH_TIMEOUT_MS\n\n| Property | Value |\n|----------|-------|\n| Default | `30000` (30 seconds) |\n| Purpose | Per-request timeout in milliseconds |\n| Type | Positive integer |\n\nControls the maximum duration allowed for any single HTTP request, including DNS resolution, TCP connection, TLS handshake, and response body transfer.\n\n```typescript\nconst config = {\n  timeoutMs: intEnv(\"MARKFETCH_TIMEOUT_MS\", 30_000),\n};\n```\n\nValidation rejects non-positive integers, non-integer values, and non-finite numbers (NaN, Infinity). A malformed value produces:\n\n```\n[core] Error: Invalid MARKFETCH_TIMEOUT_MS=\"abc\" — expected a positive integer.\n```\n\n资料来源：[src/core.ts:1-50]()\n\n### MARKFETCH_MAX_BYTES\n\n| Property | Value |\n|----------|-------|\n| Default | `5000000` (~4.77 MB) |\n| Purpose | Cap on response body and extracted markdown |\n| Type | Positive integer |\n\nBoth the raw HTTP response body and the final extracted markdown are checked against this limit. If either exceeds the cap, the operation returns `too_large` error.\n\n```typescript\nconst config = {\n  maxBytes: intEnv(\"MARKFETCH_MAX_BYTES\", 5_000_000),\n};\n```\n\n资料来源：[src/core.ts:1-50]()\n\n### MARKFETCH_USER_AGENT\n\n| Property | Value |\n|----------|-------|\n| Default | `Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36` |\n| Purpose | HTTP User-Agent header and Sec-CH-UA-* client hints |\n| Type | String (must contain \"Chrome\") |\n\nThe User-Agent string determines both the HTTP header sent to servers and the derived Sec-CH-UA-* client hints. The hints are derived at startup and remain fixed for the process lifetime.\n\n```mermaid\ngraph LR\n    A[MARKFETCH_USER_AGENT] --> B[deriveClientHints]\n    B --> C[Sec-CH-UA]\n    B --> D[Sec-CH-UA-Mobile]\n    B --> E[Sec-CH-UA-Platform]\n    A --> F[User-Agent Header]\n```\n\n```typescript\nfunction deriveClientHints(ua: string): {\n  brands: string;\n  mobile: string;\n  platform: string;\n} {\n  const versionMatch = /\\bChrome\\/(\\d+)/.exec(ua);\n  if (!versionMatch) {\n    throw new Error(\n      `Invalid MARKFETCH_USER_AGENT=${JSON.stringify(ua)} — expected a Chrome User-Agent containing \"Chrome/...\"`\n    );\n  }\n  // ...\n}\n```\n\nThe UA must contain a Chrome version string. Non-Chrome UAs fail fast at startup to prevent fingerprinting mismatches that would increase bot detection.\n\n资料来源：[src/core.ts:1-50]()\n\n## Write Sandbox (MCP-Only)\n\n### MARKFETCH_ALLOWED_WRITE_ROOTS\n\n| Property | Value |\n|----------|-------|\n| Default | `os.tmpdir() ∪ process.cwd()` |\n| Purpose | Restrict MCP `savePath` writes to specific directories |\n| Type | Platform-delimiter-separated absolute paths |\n| Platform | POSIX: `:` delimiter; Windows: `;` delimiter |\n| Mode | MCP-only (CLI has no sandbox) |\n\nThis variable applies exclusively to the MCP server mode. The CLI operates without restriction, treating the human at the shell as the security boundary.\n\n```mermaid\ngraph TD\n    A[MCP savePath request] --> B{Path inside allowed roots?}\n    B -->|Yes| C[Write file]\n    B -->|No| D[Return save_forbidden error]\n    C --> E[Confirmation to client]\n    D --> F[No file created]\n```\n\nWhen set, the value **replaces** the defaults entirely rather than merging with them. To retain access to the default directories, include them explicitly:\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"/Users/me/markfetch-out:/tmp\"\n      }\n    }\n  }\n}\n```\n\nOn Windows:\n\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"C:\\\\Users\\\\me\\\\markfetch-out;C:\\\\Users\\\\me\\\\AppData\\\\Local\\\\Temp\"\n      }\n    }\n  }\n}\n```\n\n### Validation Rules\n\nEach entry in the list must be:\n\n1. An absolute path (relative paths fail fast)\n2. An existing directory at startup\n3. Resolved through symlinks for containment checks\n\n```typescript\nfunction buildAllowedRoots(envValue?: string): string[] {\n  // ...\n}\n```\n\nSymlinks pointing outside the sandbox are blocked. The canonicalized path flows from the containment check into `writeFile`, ensuring the file is created exactly at the validated location.\n\n资料来源：[src/sandbox.ts:1-50]()\n资料来源：[src/mcp.ts:1-50]()\n\n## Error Codes\n\nWhen environment variable validation fails, markfetch writes to stderr and exits with a non-zero status:\n\n| Error Code | Trigger | Exit Status |\n|------------|---------|-------------|\n| Startup failure | Invalid MARKFETCH_TIMEOUT_MS | Non-zero |\n| Startup failure | Invalid MARKFETCH_MAX_BYTES | Non-zero |\n| Startup failure | Non-Chrome MARKFETFETCH_USER_AGENT | Non-zero |\n| Startup failure | Malformed MARKFETCH_ALLOWED_WRITE_ROOTS | Non-zero |\n| Runtime error | `save_forbidden` (MCP only) | Non-zero |\n\nRuntime errors from invalid environment values (e.g., `MARKFETCH_TIMEOUT_MS=\"abc\"`) differ from request-scoped errors like `http_error` or `timeout`. Environment misconfiguration is always fatal at startup.\n\n## Environment Variable Summary\n\n| Variable | Default | Scope | Purpose |\n|----------|---------|-------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Both | Request timeout in ms |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Both | Response and markdown size cap |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Both | HTTP fingerprint |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | tmpdir + cwd | MCP only | Write sandbox boundaries |\n\n## Configuration Priority\n\nEnvironment variables set at process startup take precedence over all other configuration. There is no runtime override mechanism—changing these values requires restarting the server.\n\n```mermaid\ngraph TD\n    A[Environment Variable] --> B[Validated at Startup]\n    B --> C[Stored in config object]\n    C --> D[Used by core.ts pipeline]\n    D --> E[HTTP Request]\n    D --> F[File Write]\n    D --> G[Response Validation]\n```\n\n## Security Considerations\n\nThe write sandbox exists because the MCP tool is driven by a language model, which may be steered by content from a page it just fetched. Without sandboxing, a malicious page could诱导 the model to request writes outside expected directories.\n\nThe CLI intentionally has no sandbox—direct human invocation at the shell establishes the trust boundary.\n\n资料来源：[README.md:1-100]()\n</details>\n\n---\n\n<a id='write-sandbox'></a>\n\n## Write Sandbox Security\n\n### 相关页面\n\n相关主题：[MCP Server Integration](#mcp-server), [Environment Variables](#environment-variables), [Error Handling](#error-handling)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n</details>\n\n# Write Sandbox Security\n\n## Overview\n\nThe Write Sandbox is a security mechanism in markfetch that restricts filesystem writes initiated via the MCP (Model Context Protocol) interface to a configurable set of allowed root directories. This protection prevents a language model, which may be influenced by fetched content, from writing files to arbitrary locations on the host system.\n\nThe sandbox enforces path containment by resolving symlinks and comparing canonicalized paths against the configured allowed roots. Any attempted write outside the sandbox boundary returns a `save_forbidden` error and the file is never created.\n\n## Purpose and Scope\n\n### Security Boundary\n\nThe sandbox exists because MCP tools are driven by a language model that can be steered by content from pages it fetches. Without containment:\n\n- A malicious or compromised webpage could instruct the LLM to write files to sensitive locations (e.g., `~/.ssh/authorized_keys`, `~/.bashrc`)\n- Path traversal attempts via symlinks could escape expected boundaries\n- Untrusted fetched content could modify configuration files or inject malicious code\n\nThe CLI mode intentionally has **no sandbox**. A human at the shell is considered the security boundary, as the user has direct control over command invocation and can review output before it reaches any model.\n\n### Scope Limitations\n\n| Scope | Sandboxed? |\n|-------|------------|\n| MCP server (`fetch_markdown` tool) | Yes |\n| CLI mode (`markfetch <url>`) | No |\n| Direct `node` execution | No |\n\n资料来源：[README.md:68-70](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n## Configuration\n\n### Environment Variable\n\n| Variable | Type | Default | Description |\n|----------|------|---------|-------------|\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | String | `os.tmpdir()` + `process.cwd()` | Path-delimiter-separated list of absolute paths permitted as MCP `savePath` write roots |\n\n### Path Delimiters\n\nThe delimiter varies by platform:\n\n| Platform | Delimiter | Example |\n|----------|-----------|---------|\n| POSIX (Linux, macOS) | `:` | `/tmp:/home/user/markfetch-out` |\n| Windows | `;` | `C:\\Users\\me\\markfetch-out;C:\\Temp` |\n\n### Behavior Rules\n\n1. **Replacement, not merge**: When set, the variable replaces the defaults entirely. To retain access to `os.tmpdir()` or `process.cwd()`, explicitly include them.\n\n2. **Validation at startup**: Malformed values (non-absolute entries, nonexistent directories) cause the server to fail fast on stderr.\n\n3. **Realpath resolution**: Each root is resolved once via `fs.realpath` at startup to canonicalize symlinks.\n\n资料来源：[README.md:71-89](https://github.com/vasylenko/markfetch/blob/main/README.md)\n\n### Configuration Example\n\n**POSIX:**\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"/Users/me/markfetch-out:/tmp\"\n      }\n    }\n  }\n}\n```\n\n**Windows:**\n```json\n{\n  \"mcpServers\": {\n    \"markfetch\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"markfetch\"],\n      \"env\": {\n        \"MARKFETCH_ALLOWED_WRITE_ROOTS\": \"C:\\\\Users\\\\me\\\\markfetch-out;C:\\\\Users\\\\me\\\\AppData\\\\Local\\\\Temp\"\n      }\n    }\n  }\n}\n```\n\n## Security Model\n\n### Path Resolution Flow\n\n```mermaid\ngraph TD\n    A[User provides savePath] --> B{Is path absolute?}\n    B -->|No| E[Error: savePath must be absolute]\n    B -->|Yes| C[Resolve via fs.realpath]\n    C --> D{Is resolved path inside allowed roots?}\n    D -->|Yes| F[Allow write to resolved path]\n    D -->|No| G[Return save_forbidden error]\n    \n    H[Allowed roots from env] --> I[Realpath-resolved at startup]\n    I --> D\n```\n\n### Symlink Handling\n\nThe sandbox protects against symlink-based escapes:\n\n1. **Resolve before check**: Symlinks are resolved via `fs.realpath` before containment validation\n2. **Re-resolve at write time**: The canonicalized path from the validation check flows directly into `writeFile`\n3. **No lexical comparison**: A path like `<sandbox>/link/..` is not compared lexically against the roots—it's resolved first, then validated\n\nThis prevents attacks where a symlink planted inside the sandbox points outside, collapsing lexically for the check but resolving to an external location at write time.\n\n资料来源：[CHANGELOG.md:17-25](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n\n### Platform-Specific Behaviors\n\n| Platform | Case Sensitivity | Notes |\n|----------|------------------|-------|\n| Linux/macOS | Case-sensitive | Paths must match exactly |\n| Windows | Case-insensitive | `C:\\Users\\Bob` and `c:\\users\\bob` are equivalent |\n\nOn Windows, the containment check lowercases both the root and target paths before comparison.\n\n资料来源：[src/sandbox.ts:28-30](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n\n## Core Implementation\n\n### API Design\n\nThe sandbox module exposes two primary functions:\n\n```typescript\nfunction buildAllowedRoots(env: Record<string, string | undefined>): string[]\nfunction validateSavePath(\n  savePath: string,\n  roots: string[]\n): { ok: boolean; resolved?: string; reason?: string }\n```\n\n### `buildAllowedRoots()`\n\nParses `MARKFETCH_ALLOWED_WRITE_ROOTS` from environment variables:\n\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| `env` | `Record<string, string \\| undefined>` | Process environment variables |\n\n| Return Type | Description |\n|-------------|-------------|\n| `string[]` | Array of absolute, realpath-resolved directory paths |\n\n**Logic:**\n1. If `MARKFETCH_ALLOWED_WRITE_ROOTS` is unset: return `[os.tmpdir(), process.cwd()]`\n2. If set: split by platform delimiter, validate each is absolute and exists\n3. Resolve each via `fs.realpath` for canonical form\n\n### `validateSavePath()`\n\nValidates a save path is within allowed roots:\n\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| `savePath` | `string` | The requested save path |\n| `roots` | `string[]` | Allowed root directories |\n\n| Return Type | Description |\n|-------------|-------------|\n| `{ ok: true, resolved: string }` | Path is allowed; `resolved` is the canonicalized path for writing |\n| `{ ok: false, reason: string }` | Path is outside sandbox; `reason` describes the violation |\n\n**Validation steps:**\n1. Resolve `savePath` via `fs.realpath`\n2. For each root, compute relative path from root to resolved target\n3. If relative path is empty (same directory) or does not start with `..` and is not absolute: allow\n4. Otherwise: reject with reason listing allowed roots\n\n资料来源：[src/sandbox.ts:1-50](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n\n## Error Handling\n\n### Error Codes\n\n| Code | Condition | Response |\n|------|-----------|----------|\n| `save_forbidden` | `savePath` resolves outside allowed roots | No file written; MCP returns error |\n| `save_failed` | `savePath` is valid but `writeFile` fails | No file written; MCP returns error |\n\n### Error Message Format\n\nAll sandbox errors return the format:\n```\n[save_forbidden] '<path>' is outside the allowed write roots: ['/allowed/root1', '/allowed/root2']\n```\n\nThis provides:\n- The attempted path\n- The reason for rejection\n- The list of allowed roots for debugging\n\n资料来源：[src/mcp.ts:8-13](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n## MCP Integration\n\n### Tool Schema\n\n```typescript\nserver.registerTool(\"fetch_markdown\", {\n  inputSchema: {\n    url: z.string().url().describe(\"...\"),\n    savePath: z.string()\n      .refine(isAbsolute, \"savePath must be an absolute filesystem path\")\n      .optional()\n      .describe(\"Optional. When provided, the fetched markdown is written to this absolute filesystem path...\")\n  }\n});\n```\n\n### Validation Flow\n\n1. MCP adapter receives `savePath` parameter\n2. Validates path is absolute (via Zod schema)\n3. Calls `validateSavePath(savePath, allowedRoots)`\n4. If `ok: false`: throw `MarkfetchError` with `save_forbidden` code\n5. If `ok: true`: use `resolved` path for `writeFile`\n\n资料来源：[src/mcp.ts:24-35](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n\n## Architecture Diagram\n\n```mermaid\ngraph LR\n    subgraph MCP_Client\n        A[LLM sends fetch_markdown with savePath]\n    end\n    \n    subgraph MCP_Server\n        B[src/mcp.ts - MCP adapter]\n        C[src/core.ts - fetchMarkdown]\n        D[src/sandbox.ts - validateSavePath]\n    end\n    \n    subgraph File_System\n        E[fs.realpath resolution]\n        F[fs.writeFile]\n    end\n    \n    A --> B\n    B -->|validate path| D\n    D -->|resolve symlink| E\n    E -->|check containment| D\n    D -->|ok: true| C\n    C -->|write markdown| F\n    \n    D -->|ok: false| B\n    B -->|save_forbidden| A\n```\n\n## CLI vs MCP Behavior\n\n| Aspect | CLI Mode | MCP Mode |\n|--------|----------|----------|\n| Write sandbox | None | Enforced |\n| Path validation | Not performed | Required |\n| Symlink resolution | Not performed | Required |\n| `savePath` parameter | Optional, `-o` flag | Optional, tool parameter |\n| Relative path resolution | Resolves against cwd | Not allowed (must be absolute) |\n\nThe CLI adapter resolves relative paths internally for convenience, but the MCP adapter requires absolute paths and enforces the sandbox.\n\n资料来源：[src/cli.ts:6-18](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n\n## Security Considerations\n\n### Attack Vectors Mitigated\n\n1. **Path traversal**: `../../etc/passwd` is resolved before checking\n2. **Symlink escape**: `<sandbox>/link_to_external` is resolved and rejected\n3. **Case confusion (Windows)**: `C:\\Users\\Bob` equals `c:\\users\\bob`\n4. **Tilde expansion**: Not performed; shell expands `~` before argv reaches process\n\n### Remaining Trust Boundaries\n\n| Trust Level | Description |\n|-------------|-------------|\n| Filesystem permissions | Sandbox does not override OS file permissions |\n| Network | Does not prevent network-based attacks |\n| Content injection | Does not sanitize markdown content before writing |\n\n## Related Files\n\n| File | Role |\n|------|------|\n| `src/sandbox.ts` | Core sandbox validation logic |\n| `src/mcp.ts` | MCP server adapter, uses sandbox |\n| `src/cli.ts` | CLI adapter, no sandbox |\n| `src/core.ts` | Core fetch pipeline |\n| `README.md` | User documentation and configuration |\n| `CHANGELOG.md` | Historical security fix for symlink escape |\n\n## Changelog\n\n| Version | Change |\n|---------|--------|\n| 0.6.0 | Current release with full sandbox implementation |\n| 0.5.0 | CLI mode added (unrestricted by design) |\n| < 0.5.0 | MCP-only, sandbox introduced |\n\n资料来源：[package.json:3](https://github.com/vasylenko/markfetch/blob/main/package.json)\n\n---\n\n<a id='error-handling'></a>\n\n## Error Handling\n\n### 相关页面\n\n相关主题：[Processing Pipeline](#processing-pipeline), [Write Sandbox Security](#write-sandbox), [Environment Variables](#environment-variables)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n- [CHANGELOG.md](https://github.com/vasylenko/markfetch/blob/main/CHANGELOG.md)\n</details>\n\n# Error Handling\n\nmarkfetch implements a deterministic, structured error handling system that provides consistent error reporting across both CLI and MCP interfaces. All errors are categorized into specific codes that enable precise failure diagnosis and appropriate recovery strategies.\n\n## Error Code Reference\n\nmarkfetch defines eight deterministic error codes that cover all failure scenarios. Each code is designed to be actionable, helping callers understand exactly what went wrong and how to respond.\n\n| Error Code | Meaning | Typical Cause |\n|---|---|---|\n| `network_error` | DNS, TCP, or TLS failure | Firewall blocking, network unavailable, invalid hostname |\n| `http_error` | Non-2xx HTTP response | 404 page not found, 403 forbidden, 500 server error |\n| `timeout` | Request exceeded `MARKFETCH_TIMEOUT_MS` | Slow server, large page, network latency |\n| `unsupported_content_type` | Response is not HTML | Binary files, JSON APIs, PDF documents |\n| `extraction_failed` | Readability found no article content | Pure client-rendered SPAs with no static HTML |\n| `too_large` | Body or markdown exceeded `MARKFETCH_MAX_BYTES` | Very large articles with embedded media |\n| `save_failed` | File write operation failed | Missing parent directory, permission denied |\n| `save_forbidden` | Save path outside allowed write roots | Path traverses symlink outside sandbox |\n\n资料来源：[README.md](README.md)\n\n## Error Architecture\n\nThe error handling system follows a layered architecture where core validation and error creation happen in `src/core.ts`, while each adapter (CLI and MCP) provides interface-specific error formatting and reporting.\n\n```mermaid\ngraph TD\n    A[Request] --> B[core.ts Validation]\n    B --> C{Error Condition?}\n    C -->|No| D[Successful Fetch]\n    C -->|Yes| E[MarkfetchError Thrown]\n    E --> F[Adapter Layer]\n    F --> G[CLI Adapter]\n    F --> H[MCP Adapter]\n    G --> I[stderr: [code] message]\n    H --> J[content[0].text: [code] message]\n    J --> K[isError: true]\n```\n\n资料来源：[src/core.ts](src/core.ts), [src/cli.ts](src/cli.ts), [src/mcp.ts](src/mcp.ts)\n\n## MarkfetchError Class\n\nThe central error type is `MarkfetchError`, which encapsulates both the error code and human-readable message. This class serves as the single error type thrown throughout the application.\n\n```typescript\nclass MarkfetchError {\n  constructor(\n    public readonly code: ErrorCode,\n    public readonly message: string\n  ) {}\n}\n```\n\n资料来源：[src/core.ts:1-100](src/core.ts)\n\n## Environment Variable Validation\n\nmarkfetch validates configuration environment variables at startup to fail fast on misconfiguration rather than producing confusing per-request errors.\n\n| Variable | Default | Validation Rules |\n|---|---|---|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Positive integer |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Positive integer |\n| `MARKFETCH_USER_AGENT` | Chrome 130 UA string | Must contain Chrome substring |\n\nThe `intEnv` function performs validation:\n\n```typescript\nfunction intEnv(name: string, fallback: number): number {\n  const raw = process.env[name];\n  if (raw == null || raw === \"\") return fallback;\n  const n = Number(raw);\n  if (!Number.isFinite(n) || !Number.isInteger(n) || n <= 0) {\n    throw new Error(\n      `Invalid ${name}=${JSON.stringify(raw)} — expected a positive integer.`,\n    );\n  }\n  return n;\n}\n```\n\n资料来源：[src/core.ts:1-100](src/core.ts)\n\n### User-Agent Validation\n\nThe `MARKFETFET_USER_AGENT` must be a valid Chrome User-Agent string. This requirement exists because Sec-CH-UA-* client hints are derived from the User-Agent at startup, and a mismatch creates a stronger bot signal.\n\n```typescript\nfunction deriveClientHints(ua: string): {\n  brands: string;\n  mobile: string;\n  platform: string;\n} {\n  const versionMatch = /\\bChrome\\/(\\d+)/.exec(ua);\n  if (!versionMatch) {\n    throw new Error(\n      `Invalid MARKFETCH_USER_AGENT=${JSON.stringify(ua)} — expected a Chrome User-Agent containing \"Chrome/VERSION\".`,\n    );\n  }\n  // ...\n}\n```\n\n资料来源：[src/core.ts:1-100](src/core.ts)\n\n## CLI Error Handling\n\nThe CLI adapter catches errors thrown from core and formats them for stderr output. Error output follows a consistent `[code] message` format that matches the MCP error format exactly.\n\n```typescript\ntry {\n  const { markdown, bytes, savedTo } = await fetchMarkdown({\n    url,\n    savePath,\n  });\n  // ... success handling\n} catch (err) {\n  const { code, message } = classifyError(err);\n  console.error(`[${code}] ${message}`);\n  // Use exitCode so pending output drains before process exits\n  process.exitCode = 1;\n}\n```\n\n资料来源：[src/cli.ts:1-50](src/cli.ts)\n\n### CLI Exit Codes\n\n| Scenario | Exit Code | Output |\n|---|---|---|\n| Success (stdout) | 0 | Raw markdown |\n| Success (save to file) | 0 | `Saved X bytes to /path` |\n| Any error | 1 | `[code] message` to stderr |\n\nThe use of `process.exitCode = 1` (rather than `process.exit(1)`) ensures pending stdout/stderr output drains before the process terminates, which is important when stdout is piped to a slow consumer.\n\n资料来源：[src/cli.ts:1-50](src/cli.ts)\n\n## MCP Error Handling\n\nThe MCP adapter returns errors in a format compatible with the MCP protocol. Errors appear in the `content[0].text` field with `isError: true` set.\n\n```typescript\nfunction errorResult(code: ErrorCode, message: string) {\n  return {\n    content: [{ type: \"text\" as const, text: `[${code}] ${message}` }],\n    isError: true,\n  };\n}\n```\n\n资料来源：[src/mcp.ts:1-50](src/mcp.ts)\n\n### MCP Response Structure for Errors\n\n```json\n{\n  \"content\": [\n    {\n      \"type\": \"text\",\n      \"text\": \"[network_error] DNS lookup failed\"\n    }\n  ],\n  \"isError\": true\n}\n```\n\n资料来源：[src/mcp.ts:1-50](src/mcp.ts)\n\n## Write Sandbox Errors\n\nThe MCP interface enforces a write sandbox that restricts file saves to configured root directories. Errors occur when `savePath` resolves to a location outside the allowed roots.\n\n```typescript\nexport function checkWritePath(\n  target: string,\n  roots: string[],\n): { ok: true; resolved: string } | { ok: false; reason: string } {\n  // ... validation logic\n  return {\n    ok: false,\n    reason: `'${reattached}' is outside the allowed write roots: [${roots.map((r) => `'${r}'`).join(\", \")}]`,\n  };\n}\n```\n\n资料来源：[src/sandbox.ts:1-100](src/sandbox.ts)\n\n### Allowed Write Roots Configuration\n\n| Platform | Default Roots | Delimiter |\n|---|---|---|\n| POSIX | `os.tmpdir()` + `process.cwd()` | `:` |\n| Windows | `os.tmpdir()` + `process.cwd()` | `;` |\n\nOverride with `MARKFETCH_ALLOWED_WRITE_ROOTS` environment variable. When set, this **replaces** the defaults entirely rather than merging.\n\n资料来源：[README.md](README.md)\n\n### Symlink Handling\n\nThe sandbox correctly resolves symlinks to prevent escape attempts like `<sandbox>/link/../out.md` where `link` points outside the sandbox. The canonicalized path flows from the containment check into `writeFile`, ensuring the file is created exactly at the validated location.\n\n资料来源：[CHANGELOG.md](CHANGELOG.md), [src/sandbox.ts:1-100](src/sandbox.ts)\n\n## Error Classification\n\nThe `classifyError` function normalizes different error types into the `MarkfetchError` format used throughout the system:\n\n```typescript\nfunction classifyError(err: unknown): { code: string; message: string } {\n  if (err instanceof MarkfetchError) {\n    return { code: err.code, message: err.message };\n  }\n  if (err instanceof Error) {\n    return { code: \"network_error\", message: err.message };\n  }\n  return { code: \"network_error\", message: String(err) };\n}\n```\n\n资料来源：[src/core.ts:1-100](src/core.ts)\n\n### Error Source Mapping\n\n| Error Source | Code Produced |\n|---|---|\n| `MarkfetchError` instances | Original code preserved |\n| `Error` instances | `network_error` |\n| Non-Error values | `network_error` with string coercion |\n\n## Unified Error Flow\n\nVersion 0.5.0 introduced a refactoring where three inline `return errorResult(...)` sites in the MCP handler were converted to throw `MarkfetchError` from core uniformly. Both adapters now catch and convert errors consistently.\n\nThis architectural change ensures that both CLI and MCP interfaces produce identical error codes and messages for the same failure conditions.\n\n资料来源：[CHANGELOG.md](CHANGELOG.md)\n\n## Best Practices for Error Handling\n\n### For MCP Clients\n\n1. Check `isError` field in the response object\n2. Parse the `content[0].text` field for the `[code] message` format\n3. Handle `extraction_failed` gracefully for client-rendered SPAs\n4. Use `savePath` parameter for large responses to avoid tool-result truncation\n\n### For CLI Consumers\n\n1. Redirect stderr to capture error codes\n2. Parse `[code] message` format from stderr\n3. Use `markfetch url 2>&1 | head -1` to get the error\n\n### For Save Operations\n\n1. Always use absolute paths for `savePath`\n2. Verify `MARKFETCH_ALLOWED_WRITE_ROOTS` includes your target directory\n3. Check for `save_forbidden` before `save_failed` in error handling logic\n\n---\n\n<a id='development'></a>\n\n## Development Guide\n\n### 相关页面\n\n相关主题：[Introduction](#introduction), [Quick Start Guide](#quickstart)\n\n<details>\n<summary>相关源码文件</summary>\n\n以下源码文件用于生成本页说明：\n\n- [package.json](https://github.com/vasylenko/markfetch/blob/main/package.json)\n- [src/core.ts](https://github.com/vasylenko/markfetch/blob/main/src/core.ts)\n- [src/cli.ts](https://github.com/vasylenko/markfetch/blob/main/src/cli.ts)\n- [src/mcp.ts](https://github.com/vasylenko/markfetch/blob/main/src/mcp.ts)\n- [src/sandbox.ts](https://github.com/vasylenko/markfetch/blob/main/src/sandbox.ts)\n- [README.md](https://github.com/vasylenko/markfetch/blob/main/README.md)\n</details>\n\n# Development Guide\n\nThis guide provides comprehensive information for developers who want to understand, extend, or contribute to markfetch.\n\n## Overview\n\nmarkfetch is a Node.js tool that fetches URLs and converts web content to clean markdown. It operates in two modes:\n\n1. **CLI Mode** - Command-line interface for shell integration\n2. **MCP Mode** - Model Context Protocol server for AI agent integration\n\nThe project requires Node.js ≥ 24 and is distributed as an npm package. 资料来源：[package.json:8]()\n\n## Architecture\n\n```mermaid\ngraph TD\n    A[User Input] --> B{process.argv.length}\n    B -->|≥ 2 args| C[CLI Adapter]\n    B -->|Zero args| D[MCP Adapter]\n    \n    C --> E[src/cli.ts]\n    D --> F[src/mcp.ts]\n    \n    E --> G[src/core.ts]\n    F --> G\n    \n    G --> H[undici HTTP Client]\n    G --> I[linkedom HTML Parser]\n    G --> J[@mozilla/readability]\n    G --> K[turndown]\n    \n    H --> L[HTTP Response]\n    I --> M[DOM Document]\n    J --> N[Extracted Article]\n    K --> O[Markdown Output]\n```\n\n### Core Pipeline (src/core.ts)\n\nThe core module implements the main fetch-and-convert pipeline. It orchestrates:\n\n| Component | Role |\n|-----------|------|\n| `undici` | HTTP/2 transport with Chrome-like fingerprinting |\n| `linkedom` | HTML parsing to DOM |\n| `@mozilla/readability` | Article content extraction |\n| `turndown` | HTML to markdown conversion |\n\n资料来源：[src/core.ts:1-50]()\n\n### Adapters (src/cli.ts & src/mcp.ts)\n\nThe source is structured into three distinct files:\n\n| File | Purpose |\n|------|---------|\n| `src/core.ts` | Pipeline + errors (shared logic) |\n| `src/mcp.ts` | MCP stdio server adapter |\n| `src/cli.ts` | CLI argv parser + dispatcher |\n| `src/index.ts` | Lazy-import dispatcher based on `process.argv.length` |\n\n资料来源：[README.md:95-100]()\n\nThe lazy-import dispatcher ensures `console.log` calls in `cli.ts` are never reachable from the MCP path, maintaining the invariant that stdout is reserved for MCP frames. 资料来源：[CHANGELOG.md:45-47]()\n\n## Setting Up the Development Environment\n\n### Prerequisites\n\n- Node.js ≥ 24\n- npm or yarn\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/vasylenko/markfetch.git\ncd markfetch\n\n# Install dependencies\nnpm install\n```\n\n### Available Scripts\n\n| Script | Command | Purpose |\n|--------|---------|---------|\n| `dev` | `npm run dev` | Run source directly with tsx (no build required) |\n| `build` | `npm run build` | Compile TypeScript to JavaScript |\n| `test` | `npm run test` | Run test suite with tsx |\n| `inspect` | `npm run inspect` | Launch MCP inspector for debugging |\n\n资料来源：[package.json:21-28]()\n\n### Build Process\n\nThe build process consists of two steps:\n\n```bash\n# Compile TypeScript\nnpm run build\n\n# Post-build script (automatically runs after build)\nnpm run postbuild\n```\n\nThe postbuild script (`scripts/postbuild.mjs`) performs additional transformations after TypeScript compilation. 资料来源：[package.json:26]()\n\n## Project Structure\n\n```\nmarkfetch/\n├── src/\n│   ├── index.ts      # Entry point with argv dispatcher\n│   ├── core.ts       # Core fetch/extract/convert pipeline\n│   ├── cli.ts        # CLI adapter using commander\n│   ├── mcp.ts        # MCP stdio server\n│   └── sandbox.ts    # Write path sandboxing\n├── dist/             # Compiled JavaScript output\n├── tests/            # Test fixtures and test files\n├── scripts/\n│   └── postbuild.mjs # Post-compilation transformations\n└── docs/\n    └── SPEC.md       # Detailed specification\n```\n\n## Configuration\n\n### Environment Variables\n\n| Variable | Default | Purpose |\n|----------|---------|---------|\n| `MARKFETCH_TIMEOUT_MS` | `30000` | Per-request timeout in milliseconds |\n| `MARKFETCH_MAX_BYTES` | `5000000` | Cap on response body and extracted markdown |\n| `MARKFETCH_USER_AGENT` | Chrome 130 string | Override the User-Agent header |\n| `MARKFETCH_ALLOWED_WRITE_ROOTS` | `os.tmpdir()` + `process.cwd()` | MCP-only write sandbox roots |\n\n资料来源：[README.md:60-66]()\n\n### Configuration Precedence\n\n1. Environment variables set at startup\n2. Command-line flags (CLI mode)\n3. MCP tool parameters (MCP mode)\n\n## Core API\n\n### fetchMarkdown Function\n\nThe main function exported from `core.ts`:\n\n```typescript\ninterface FetchOptions {\n  url: string;\n  savePath?: string;\n}\n\ninterface FetchResult {\n  markdown: string;\n  bytes: number;\n  savedTo?: string;\n}\n```\n\n### Error Handling\n\nThe core module defines eight deterministic error codes:\n\n| Code | Meaning |\n|------|---------|\n| `network_error` | DNS/TCP/TLS failure |\n| `http_error` | Non-2xx HTTP status |\n| `timeout` | Request timeout exceeded |\n| `unsupported_content_type` | Not `text/html` or `application/xhtml+xml` |\n| `extraction_failed` | Readability found no article content |\n| `too_large` | Response or markdown exceeded size cap |\n| `save_failed` | File write failed (permissions, missing directory) |\n| `save_forbidden` | Path outside allowed write roots |\n\n资料来源：[README.md:71-80]()\n\nErrors are thrown as `MarkfetchError` from core uniformly and caught by adapters for conversion. 资料来源：[CHANGELOG.md:49-51]()\n\n## Extending the Pipeline\n\n### Adding New HTML Rewrites\n\nThe `rewriteForReadability()` function in `core.ts` handles pre-extraction HTML transformations:\n\n```typescript\nfunction rewriteForReadability(document: Document): void {\n  // Transform <aside class=\"footnote-brackets\"> to <section>\n  // Flatten <details> elements\n  // Replace div.mw-heading with their heading children\n}\n```\n\nTo add new rewrite rules, append to this function before the return statement. 资料来源：[src/core.ts:120-160]()\n\n### Customizing Markdown Conversion\n\nThe `TURNDOWN` instance is configured with:\n\n| Plugin/Option | Purpose |\n|---------------|---------|\n| `gfm` plugin | GitHub Flavored Markdown support |\n| `keepClasses: true` | Preserve `class=\"language-X\"` for code fences |\n| Custom escape | Handle `-`/`=` after inline elements |\n\n资料来源：[src/core.ts:50-90]()\n\n### Modifying Error Handling\n\nError handling flows through the `MarkfetchError` class in core:\n\n1. Core throws `MarkfetchError` with code and message\n2. Adapters catch and format for their protocol\n3. CLI: writes `[code] message` to stderr\n4. MCP: returns `{ content: [...], isError: true }`\n\n资料来源：[src/cli.ts:35-42]() 和 [src/mcp.ts:15-20]()\n\n## Write Sandbox\n\nThe MCP adapter enforces write path restrictions:\n\n```mermaid\ngraph TD\n    A[MCP savePath] --> B{absolutely path?}\n    B -->|No| C[Refine fails: savePath must be absolute]\n    B -->|Yes| D{Inside allowed roots?}\n    D -->|Yes| E[Write file]\n    D -->|No| F[Return save_forbidden error]\n```\n\n### Configuring Allowed Roots\n\nSet the environment variable with platform delimiter:\n\n```bash\n# POSIX\nexport MARKFETCH_ALLOWED_WRITE_ROOTS=\"/tmp:/home/user/docs\"\n\n# Windows\nset MARKFETCH_ALLOWED_WRITE_ROOTS=\"C:\\Users\\me\\docs;C:\\temp\"\n```\n\nThe sandbox checks resolve symlinks and applies case-folding on Windows. 资料来源：[src/sandbox.ts:20-40]()\n\n## Testing\n\n### Running Tests\n\n```bash\nnpm test\n```\n\n### Test Structure\n\nTests use Node.js built-in test runner (`--test` flag) with tsx for TypeScript support. 资料来源：[package.json:27]()\n\n### Writing New Tests\n\n1. Place test files in `tests/` directory\n2. Use `*.test.ts` naming pattern\n3. Run with `tsx --test tests/*.test.ts`\n\n## MCP Inspector\n\nDebug MCP integration using the official inspector:\n\n```bash\nnpm run inspect\n```\n\nThis launches the MCP inspector at `http://localhost:6274` where you can:\n- Test tool calls interactively\n- Inspect request/response frames\n- Verify schema validation\n\n资料来源：[package.json:27]()\n\n## Dependencies\n\n### Production Dependencies\n\n| Package | Version | Purpose |\n|---------|---------|---------|\n| `@modelcontextprotocol/sdk` | ^1.29.0 | MCP server implementation |\n| `@mozilla/readability` | ^0.5.0 | Article extraction |\n| `commander` | ^14.0.3 | CLI argument parsing |\n| `linkedom` | ^0.18.0 | HTML parsing |\n| `turndown` | ^7.0.0 | HTML to markdown |\n| `turndown-plugin-gfm` | ^1.0.2 | GFM support |\n| `undici` | ^8.2.0 | HTTP client |\n| `zod` | ^3.0.0 | Schema validation |\n\n### Development Dependencies\n\n| Package | Purpose |\n|---------|---------|\n| `@types/node` | Node.js type definitions |\n| `@types/turndown` | Turndown type definitions |\n| `tsx` | TypeScript execution |\n| `typescript` | TypeScript compiler |\n\n资料来源：[package.json:30-50]()\n\n## Version History\n\n| Version | Date | Key Changes |\n|---------|------|-------------|\n| 0.6.0 | 2026-05-13 | Write sandbox, Windows CI, save_forbidden error |\n| 0.5.0 | 2026-05-12 | CLI mode, commander dependency |\n| 0.4.1 | 2026-05-11 | README rewrite, bin path fix |\n| 0.4.0 | 2026-05-10 | MCP server with fetch_markdown tool |\n\n资料来源：[CHANGELOG.md:1-60]()\n\n## Contributing Guidelines\n\n### Code Standards\n\n- All source in TypeScript under `src/`\n- Build output to `dist/` via `npm run build`\n- Tests in `tests/` with `*.test.ts` pattern\n- No runtime `console.log` in MCP path (enforced by lazy-import structure)\n\n### Pull Request Checklist\n\n- [ ] Run `npm run build` successfully\n- [ ] Run `npm test` with all tests passing\n- [ ] Update CHANGELOG.md with changes\n- [ ] Ensure documentation reflects new behavior\n\n### Release Process\n\n```bash\nnpm run prepublishOnly\n```\n\nThis runs the build automatically before npm publish. 资料来源：[package.json:29]()\n\n---\n\n---\n\n## Doramagic 踩坑日志\n\n项目：vasylenko/markfetch\n\n摘要：发现 7 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：安装坑 - 来源证据：v0.4.1。\n\n## 1. 安装坑 · 来源证据：v0.4.1\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v0.4.1\n- 对用户的影响：可能增加新用户试用和生产接入成本。\n- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。\n- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。\n- 证据：community_evidence:github | cevd_749b65614f7b40e0b524f4e932cd4aca | https://github.com/vasylenko/markfetch/releases/tag/v0.4.1 | 来源讨论提到 node 相关条件，需在安装/试用前复核。\n\n## 2. 能力坑 · 能力判断依赖假设\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：README/documentation is current enough for a first validation pass.\n- 对用户的影响：假设不成立时，用户拿不到承诺的能力。\n- 建议检查：将假设转成下游验证清单。\n- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。\n- 证据：capability.assumptions | github_repo:1234238440 | https://github.com/vasylenko/markfetch | README/documentation is current enough for a first validation pass.\n\n## 3. 维护坑 · 维护活跃度未知\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：未记录 last_activity_observed。\n- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。\n- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。\n- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。\n- 证据：evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | last_activity_observed missing\n\n## 4. 安全/权限坑 · 下游验证发现风险项\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：no_demo\n- 对用户的影响：下游已经要求复核，不能在页面中弱化。\n- 建议检查：进入安全/权限治理复核队列。\n- 防护动作：下游风险存在时必须保持 review/recommendation 降级。\n- 证据：downstream_validation.risk_items | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium\n\n## 5. 安全/权限坑 · 存在评分风险\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：no_demo\n- 对用户的影响：风险会影响是否适合普通用户安装。\n- 建议检查：把风险写入边界卡，并确认是否需要人工复核。\n- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。\n- 证据：risks.scoring_risks | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium\n\n## 6. 维护坑 · issue/PR 响应质量未知\n\n- 严重度：low\n- 证据强度：source_linked\n- 发现：issue_or_pr_quality=unknown。\n- 对用户的影响：用户无法判断遇到问题后是否有人维护。\n- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。\n- 防护动作：issue/PR 响应未知时，必须提示维护风险。\n- 证据：evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | issue_or_pr_quality=unknown\n\n## 7. 维护坑 · 发布节奏不明确\n\n- 严重度：low\n- 证据强度：source_linked\n- 发现：release_recency=unknown。\n- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。\n- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。\n- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。\n- 证据：evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | release_recency=unknown\n\n<!-- canonical_name: vasylenko/markfetch; human_manual_source: deepwiki_human_wiki -->\n",
      "summary": "DeepWiki/Human Wiki 完整输出，末尾追加 Discovery Agent 踩坑日志。",
      "title": "Human Manual / 人类版说明书"
    },
    "pitfall_log": {
      "asset_id": "pitfall_log",
      "filename": "PITFALL_LOG.md",
      "markdown": "# Pitfall Log / 踩坑日志\n\n项目：vasylenko/markfetch\n\n摘要：发现 7 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：安装坑 - 来源证据：v0.4.1。\n\n## 1. 安装坑 · 来源证据：v0.4.1\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v0.4.1\n- 对用户的影响：可能增加新用户试用和生产接入成本。\n- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。\n- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。\n- 证据：community_evidence:github | cevd_749b65614f7b40e0b524f4e932cd4aca | https://github.com/vasylenko/markfetch/releases/tag/v0.4.1 | 来源讨论提到 node 相关条件，需在安装/试用前复核。\n\n## 2. 能力坑 · 能力判断依赖假设\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：README/documentation is current enough for a first validation pass.\n- 对用户的影响：假设不成立时，用户拿不到承诺的能力。\n- 建议检查：将假设转成下游验证清单。\n- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。\n- 证据：capability.assumptions | github_repo:1234238440 | https://github.com/vasylenko/markfetch | README/documentation is current enough for a first validation pass.\n\n## 3. 维护坑 · 维护活跃度未知\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：未记录 last_activity_observed。\n- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。\n- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。\n- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。\n- 证据：evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | last_activity_observed missing\n\n## 4. 安全/权限坑 · 下游验证发现风险项\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：no_demo\n- 对用户的影响：下游已经要求复核，不能在页面中弱化。\n- 建议检查：进入安全/权限治理复核队列。\n- 防护动作：下游风险存在时必须保持 review/recommendation 降级。\n- 证据：downstream_validation.risk_items | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium\n\n## 5. 安全/权限坑 · 存在评分风险\n\n- 严重度：medium\n- 证据强度：source_linked\n- 发现：no_demo\n- 对用户的影响：风险会影响是否适合普通用户安装。\n- 建议检查：把风险写入边界卡，并确认是否需要人工复核。\n- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。\n- 证据：risks.scoring_risks | github_repo:1234238440 | https://github.com/vasylenko/markfetch | no_demo; severity=medium\n\n## 6. 维护坑 · issue/PR 响应质量未知\n\n- 严重度：low\n- 证据强度：source_linked\n- 发现：issue_or_pr_quality=unknown。\n- 对用户的影响：用户无法判断遇到问题后是否有人维护。\n- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。\n- 防护动作：issue/PR 响应未知时，必须提示维护风险。\n- 证据：evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | issue_or_pr_quality=unknown\n\n## 7. 维护坑 · 发布节奏不明确\n\n- 严重度：low\n- 证据强度：source_linked\n- 发现：release_recency=unknown。\n- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。\n- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。\n- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。\n- 证据：evidence.maintainer_signals | github_repo:1234238440 | https://github.com/vasylenko/markfetch | release_recency=unknown\n",
      "summary": "用户实践前最可能遇到的身份、安装、配置、运行和安全坑。",
      "title": "Pitfall Log / 踩坑日志"
    },
    "prompt_preview": {
      "asset_id": "prompt_preview",
      "filename": "PROMPT_PREVIEW.md",
      "markdown": "# markfetch - Prompt Preview\n\n> 复制下面这段 Prompt 到你常用的 AI，先试一次，不需要安装。\n> 它的目标是让你直接体验这个项目的服务方式，而不是阅读项目介绍。\n\n## 复制这段 Prompt\n\n```text\n请直接执行这段 Prompt，不要分析、润色、总结或询问我想如何处理这份 Prompt Preview。\n\n你现在扮演 markfetch 的“安装前体验版”。\n这不是项目介绍、不是评价报告、不是 README 总结。你的任务是让我用最小成本体验它的核心服务。\n\n我的试用任务：我想用它完成一个真实的工具连接与集成任务。\n我常用的宿主 AI：MCP Client\n\n【体验目标】\n围绕我的真实任务，现场演示这个项目如何把输入转成 示例引导, 判断线索。重点是让我感受到工作方式，而不是给我项目背景。\n\n【业务流约束】\n- 你必须像一个正在提供服务的项目能力包，而不是像一个讲解员。\n- 每一轮只推进一个步骤；提出问题后必须停下来等我回答。\n- 每一步都必须让我感受到一个具体服务动作：澄清、整理、规划、检查、判断或收尾。\n- 每一步都要说明：当前目标、你需要我提供什么、我回答后你会产出什么。\n- 不要安装、不要运行命令、不要写代码、不要声称测试通过、不要声称已经修改文件。\n- 需要真实安装或宿主加载后才能验证的内容，必须明确说“这一步需要安装后验证”。\n- 如果我说“用示例继续”，你可以用虚构示例推进，但仍然不能声称真实执行。\n\n【可体验服务能力】\n- 安装前能力预览: Tiny CLI and MCP server: fetch an URL -- return clean markdown. Built for AI agents. 输入：用户任务, 当前 AI 对话上下文；输出：示例引导, 判断线索。\n\n【必须安装后才可验证的能力】\n- 命令行启动或安装流程: 项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 输入：终端环境, 包管理器, 项目依赖；输出：安装结果, 列表/更新/运行结果。\n\n【核心服务流】\n请严格按这个顺序带我体验。不要一次性输出完整流程：\n1. introduction：Introduction。围绕“Introduction”模拟一次用户任务，不展示安装或运行结果。\n2. quickstart：Quick Start Guide。围绕“Quick Start Guide”模拟一次用户任务，不展示安装或运行结果。\n3. processing-pipeline：Processing Pipeline。围绕“Processing Pipeline”模拟一次用户任务，不展示安装或运行结果。\n4. http-fingerprinting：HTTP/2 Fingerprinting。围绕“HTTP/2 Fingerprinting”模拟一次用户任务，不展示安装或运行结果。\n5. cli-usage：CLI Usage。围绕“CLI Usage”模拟一次用户任务，不展示安装或运行结果。\n\n【核心能力体验剧本】\n每一步都必须按“输入 -> 服务动作 -> 中间产物”执行。不要只说流程名：\n1. introduction\n输入：用户提供的“Introduction”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n2. quickstart\n输入：用户提供的“Quick Start Guide”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n3. processing-pipeline\n输入：用户提供的“Processing Pipeline”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n4. http-fingerprinting\n输入：用户提供的“HTTP/2 Fingerprinting”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n5. cli-usage\n输入：用户提供的“CLI Usage”相关信息。\n服务动作：模拟项目在这一步的核心判断和整理方式。\n中间产物：一个可检查的小结果。\n\n【项目服务规则】\n这些规则决定你如何服务用户。不要解释规则本身，而要在每一步执行时遵守：\n- 先确认用户任务、输入材料和成功标准，再模拟项目能力。\n- 每一步都必须形成可检查的小产物，并等待用户确认后再继续。\n- 凡是需要安装、调用工具或访问外部服务的能力，都必须标记为安装后验证。\n\n【每一步的服务约束】\n- Step 1 / introduction：Step 1 必须围绕“Introduction”形成一个小中间产物，并等待用户确认。\n- Step 2 / quickstart：Step 2 必须围绕“Quick Start Guide”形成一个小中间产物，并等待用户确认。\n- Step 3 / processing-pipeline：Step 3 必须围绕“Processing Pipeline”形成一个小中间产物，并等待用户确认。\n- Step 4 / http-fingerprinting：Step 4 必须围绕“HTTP/2 Fingerprinting”形成一个小中间产物，并等待用户确认。\n- Step 5 / cli-usage：Step 5 必须围绕“CLI Usage”形成一个小中间产物，并等待用户确认。\n\n【边界与风险】\n- 不要声称已经安装、运行、调用 API、读写本地文件或完成真实任务。\n- 安装前预览只能展示工作方式，不能证明兼容性、性能或输出质量。\n- 涉及安装、插件加载、工具调用或外部服务的能力必须安装后验证。\n\n【可追溯依据】\n这些路径只用于你内部校验或在我追问“依据是什么”时简要引用。不要在首次回复主动展开：\n- https://github.com/vasylenko/markfetch\n- https://github.com/vasylenko/markfetch#readme\n- README.md\n- src/index.ts\n- package.json\n- src/core.ts\n- src/cli.ts\n\n【首次问题规则】\n- 首次三问必须先确认用户目标、成功标准和边界，不要提前进入工具、安装或实现细节。\n- 如果后续需要技术条件、文件路径或运行环境，必须等用户确认目标后再追问。\n\n首次回复必须只输出下面 4 个部分：\n1. 体验开始：用 1 句话说明你将带我体验 markfetch 的核心服务。\n2. 当前步骤：明确进入 Step 1，并说明这一步要解决什么。\n3. 你会如何服务我：说明你会先改变我完成任务的哪个动作。\n4. 只问我 3 个问题，然后停下等待回答。\n\n首次回复禁止输出：后续完整流程、证据清单、安装命令、项目评价、营销文案、已经安装或运行的说法。\n\nStep 1 / brainstorming 的二轮协议：\n- 我回答首次三问后，你仍然停留在 Step 1 / brainstorming，不要进入 Step 2。\n- 第二次回复必须产出 6 个部分：澄清后的任务定义、成功标准、边界条件、\n  2-3 个可选方案、每个方案的权衡、推荐方案。\n- 第二次回复最后必须问我是否确认推荐方案；只有我明确确认后，才能进入下一步。\n- 第二次回复禁止输出 git worktree、代码计划、测试文件、命令或真实执行结果。\n\n后续对话规则：\n- 我回答后，你先完成当前步骤的中间产物并等待确认；只有我确认后，才能进入下一步。\n- 每一步都要生成一个小的中间产物，例如澄清后的目标、计划草案、测试意图、验证清单或继续/停止判断。\n- 所有演示都写成“我会建议/我会引导/这一步会形成”，不要写成已经真实执行。\n- 不要声称已经测试通过、文件已修改、命令已运行或结果已产生。\n- 如果某个能力必须安装后验证，请直接说“这一步需要安装后验证”。\n- 如果证据不足，请明确说“证据不足”，不要补事实。\n```\n",
      "summary": "不安装项目也能感受能力节奏的安全试用 Prompt。",
      "title": "Prompt Preview / 安装前试用 Prompt"
    },
    "quick_start": {
      "asset_id": "quick_start",
      "filename": "QUICK_START.md",
      "markdown": "# Quick Start / 官方入口\n\n项目：vasylenko/markfetch\n\n## 官方安装入口\n\n### Node.js / npm · 官方安装入口\n\n```bash\nnpm i -g markfetch\n```\n\n来源：https://github.com/vasylenko/markfetch#readme\n\n## 来源\n\n- repo: https://github.com/vasylenko/markfetch\n- docs: https://github.com/vasylenko/markfetch#readme\n",
      "summary": "从项目官方 README 或安装文档提取的开工入口。",
      "title": "Quick Start / 官方入口"
    }
  },
  "validation_id": "dval_340184719b7f4ddea815de0bc4647491"
}
