# osworld - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 osworld 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **AI 研究者或研究型 Agent 构建者**：README 明确围绕研究、实验或论文工作流展开。 证据：`README.md` Claim：`clm_0002` supported 0.86

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md` Claim：`clm_0001` supported 0.86

## 怎么开始

- `git clone https://github.com/xlang-ai/OSWorld` 证据：`README.md` Claim：`clm_0003` supported 0.86
- `pip install -r requirements.txt` 证据：`README.md` Claim：`clm_0004` supported 0.86
- `pip install desktop-env` 证据：`README.md` Claim：`clm_0005` supported 0.86

## 继续前判断卡

- **当前建议**：先做角色匹配试用
- **为什么**：这个项目更像角色库，核心风险是选错角色或把角色文案当执行能力；先用 Prompt Preview 试角色匹配，再决定是否沙盒导入。

### 30 秒判断

- **现在怎么做**：先做角色匹配试用
- **最小安全下一步**：先用 Prompt Preview 试角色匹配；满意后再隔离导入
- **先别相信**：角色质量和任务匹配不能直接相信。
- **继续会触碰**：角色选择偏差、命令执行、本地环境或项目文件

### 现在可以相信

- **适合人群线索：AI 研究者或研究型 Agent 构建者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0003` supported 0.86

### 现在还不能相信

- **角色质量和任务匹配不能直接相信。**（unverified）：角色库证明有很多角色，不证明每个角色都适合你的具体任务，也不证明角色能产生高质量结果。
- **不能把角色文案当成真实执行能力。**（unverified）：安装前只能判断角色描述和任务画像是否匹配，不能证明它能在宿主 AI 里完成任务。
- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。

### 继续会触碰什么

- **角色选择偏差**：用户对任务应该由哪个专家角色处理的判断。 原因：选错角色会让 AI 从错误专业视角回答，浪费时间或误导决策。
- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：先用交互式试用验证任务画像和角色匹配，不要先导入整套角色库。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **保留原始角色选择记录**：如果输出偏题，可以回到任务画像阶段重新选择角色，而不是继续沿着错误角色推进。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0006` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0007` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：1105
- 重要文件覆盖：40/1105
- 证据索引条目：77
- 角色 / Skill 条目：24

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 osworld 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 osworld 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 osworld 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```

## 角色 / Skill 索引

- 共索引 24 个角色 / Skill / 项目文档条目。

- **📢 Updates**（project_doc）：Website • Paper • Doc • Data • Data Viewer • Discord • Cache 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`
- **Evaluation examples**（project_doc）：Here we put the data examples to benchmark the ability of agents when interacting with GUI. The examples are stored in ./examples where each data item formatted as: 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`evaluation_examples/README.md`
- **Agent**（project_doc）：Supported Models We currently support the following models as the foundational models for the agents: - GPT-3.5 gpt-3.5-turbo-16k, ... - GPT-4 gpt-4-0125-preview, gpt-4-1106-preview, ... - GPT-4V gpt-4-vision-preview, ... - Gemini-Pro - Gemini-Pro-Vision - Claude-3, 2 claude-3-haiku-2024030, claude-3-sonnet-2024022, ... - ... 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/README.md`
- **OSWorld Monitor**（project_doc）：A web-based monitoring dashboard for OSWorld tasks and executions. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`monitor/README.md`
- **Scripts Directory**（project_doc）：This directory contains all the run scripts for OSWorld, organized by type. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`scripts/README.md`
- **Evaluator Setup Details**（project_doc）：Evaluator Setup Details Setup scaffolding for the evaluators in the desktop environment for those who want to know the details of the evaluator setup for customized evaluation and extension 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`desktop_env/evaluators/README.md`
- **Readme**（project_doc）： 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`desktop_env/providers/README.md`
- **Server setup**（project_doc）：This README is useful if you want to set up your own machine for the environment. This README is not yet finished. Please contact the author if you need any assistance. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`desktop_env/server/README.md`
- **Anthropic Agent Integration**（project_doc）：Anthropic Agent Integration Notice: As Anthropic API only supports image’s long edge is less than 1568 pixels and image is less than ~1,600 tokens, we resize the screenshot to 1280x720. Setup To run with the Anthropic API, you need to set up your environment with the necessary API keys and configurations. Follow these steps: 1. Install Dependencies : Ensure you have the required Python packages installed. You can do… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/anthropic/README.md`
- **aworldGUIAgent-v1**（project_doc）：aworldGUIAgent-v1 built on the AWorld Framework https://github.com/inclusionAI/AWorld , specifically designed to tackle complex desktop automation tasks within the OSWorld-verified https://os-world.github.io/ benchmark. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/aworldguiagent/README.md`
- **Readme**（project_doc）：1. Get the URLs from majestic million and save them to majestic million.csv 2. Run scrapy spider to get the data from the URLs 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/gui_som/data_preparation/README.md`
- **Kimi Agent**（project_doc）：KimiAgent is a computer use agent powered by the Kimi models developed by Moonshot AI. It observes the desktop through screenshots and emits pyautogui code for the OSWorld GUI executor to run. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/kimi/README.md`
- **Deploy CogAgent as server**（project_doc）：The CogAgent LLM will be deployed on http://127.0.0.1:8000 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/llm_server/CogAgent/README.md`
- **M3 Agent**（project_doc）：Standalone agent module for evaluating an M3-trained model on OSWorld https://github.com/xlang-ai/OSWorld . Provides the system prompt, response parser and screenshot handling needed to drive M3 through an Anthropic-Messages-compatible endpoint. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/m3/README.md`
- **Maestro Utilities**（project_doc）：This directory contains various utility functions for the Maestro project to improve code reusability and maintainability. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/maestro/utils/README.md`
- **Pointer Agent Integration**（project_doc）：1. Setup Environment : You need to create a virutalenv and install requirements: 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/pointer/README.md`
- **Qwen Agents**（project_doc）：This package contains OpenAI-compatible Qwen vision agents for OSWorld. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/qwen/README.md`
- **Surfer H — OSWorld Benchmark Runner**（project_doc）：Surfer H — OSWorld Benchmark Runner 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/surferH/README.md`
- **UiPath Screen Agent**（project_doc）：23 Dec 2025 - Updated the planner model to Claude 4.5 Opus https://www.anthropic.com/news/claude-opus-4-5 - Updated the grounder model to an internally finetuned version of Qwen3-VL https://github.com/QwenLM/Qwen3-VL and allowing it to predict "refusal" similar to OSWorld-G for elements that do not exist - Added memory for storing relevant information across steps - Improved utilization of the UI element detector fo… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/uipath/README.md`
- **VLAA-GUI**（project_doc）：VLAA-GUI is a multi-agent system for desktop GUI automation. VLAA-GUI decomposes complex desktop tasks into perception, planning, and action through a pipeline of specialized sub-agents. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`mm_agents/vlaa_gui/README.md`
- **OSWorld Setup and Evaluation Guide**（project_doc）：This comprehensive guide covers all aspects of setting up and running OSWorld evaluations, including account configuration, proxy setup, and public evaluation platform deployment. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`SETUP_GUIDELINE.md`
- **Aliyun ECS Provider Configuration Guide**（project_doc）：Aliyun ECS Provider Configuration Guide 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`desktop_env/providers/aliyun/ALIYUN_GUIDELINE.md`
- **阿里云ECS提供商配置指南**（project_doc）：1. 阿里云账户 ：您需要一个有效的阿里云账户，本脚本默认ECS通过按量付费方式拉起，需保证账户余额在100以上。 2. 访问密钥 ：在阿里云RAM访问控制控制台中创建AccessKey ID和AccessKey Secret，并授权ECS控制权限 3. VPC设置 ：在目标地域创建VPC、交换机和安全组 4. 自定义镜像 ：创建OSWorld自定义镜像。 5. 建议手动完成一次ECS创建流程后，记录所有需要的环境变量信息。 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`desktop_env/providers/aliyun/ALIYUN_GUIDELINE_CN.md`
- **☁ Configuration of AWS**（project_doc）：Welcome to the AWS VM Management documentation. Before you proceed with using the code to manage AWS services, please ensure the following variables are set correctly according to your AWS environment. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`desktop_env/providers/aws/AWS_GUIDELINE.md`

## 证据索引

- 共索引 77 条证据。

- **📢 Updates**（documentation）：Website • Paper • Doc • Data • Data Viewer • Discord • Cache 证据：`README.md`
- **Evaluation examples**（documentation）：Here we put the data examples to benchmark the ability of agents when interacting with GUI. The examples are stored in ./examples where each data item formatted as: 证据：`evaluation_examples/README.md`
- **Agent**（documentation）：Supported Models We currently support the following models as the foundational models for the agents: - GPT-3.5 gpt-3.5-turbo-16k, ... - GPT-4 gpt-4-0125-preview, gpt-4-1106-preview, ... - GPT-4V gpt-4-vision-preview, ... - Gemini-Pro - Gemini-Pro-Vision - Claude-3, 2 claude-3-haiku-2024030, claude-3-sonnet-2024022, ... - ... 证据：`mm_agents/README.md`
- **OSWorld Monitor**（documentation）：A web-based monitoring dashboard for OSWorld tasks and executions. 证据：`monitor/README.md`
- **Scripts Directory**（documentation）：This directory contains all the run scripts for OSWorld, organized by type. 证据：`scripts/README.md`
- **Evaluator Setup Details**（documentation）：Evaluator Setup Details Setup scaffolding for the evaluators in the desktop environment for those who want to know the details of the evaluator setup for customized evaluation and extension 证据：`desktop_env/evaluators/README.md`
- **Server setup**（documentation）：This README is useful if you want to set up your own machine for the environment. This README is not yet finished. Please contact the author if you need any assistance. 证据：`desktop_env/server/README.md`
- **Anthropic Agent Integration**（documentation）：Anthropic Agent Integration Notice: As Anthropic API only supports image’s long edge is less than 1568 pixels and image is less than ~1,600 tokens, we resize the screenshot to 1280x720. Setup To run with the Anthropic API, you need to set up your environment with the necessary API keys and configurations. Follow these steps: 1. Install Dependencies : Ensure you have the required Python packages installed. You can do this by running: 2. Set Environment Variables : You need to set the environment variable with your API key. You can do this in .env: For aws bedrock: For anthropic, you need set APIProvider to anthropic and set the API key: 证据：`mm_agents/anthropic/README.md`
- **aworldGUIAgent-v1**（documentation）：aworldGUIAgent-v1 built on the AWorld Framework https://github.com/inclusionAI/AWorld , specifically designed to tackle complex desktop automation tasks within the OSWorld-verified https://os-world.github.io/ benchmark. 证据：`mm_agents/aworldguiagent/README.md`
- **Readme**（documentation）：1. Get the URLs from majestic million and save them to majestic million.csv 2. Run scrapy spider to get the data from the URLs 证据：`mm_agents/gui_som/data_preparation/README.md`
- **Kimi Agent**（documentation）：KimiAgent is a computer use agent powered by the Kimi models developed by Moonshot AI. It observes the desktop through screenshots and emits pyautogui code for the OSWorld GUI executor to run. 证据：`mm_agents/kimi/README.md`
- **Deploy CogAgent as server**（documentation）：The CogAgent LLM will be deployed on http://127.0.0.1:8000 证据：`mm_agents/llm_server/CogAgent/README.md`
- **M3 Agent**（documentation）：Standalone agent module for evaluating an M3-trained model on OSWorld https://github.com/xlang-ai/OSWorld . Provides the system prompt, response parser and screenshot handling needed to drive M3 through an Anthropic-Messages-compatible endpoint. 证据：`mm_agents/m3/README.md`
- **Maestro Utilities**（documentation）：This directory contains various utility functions for the Maestro project to improve code reusability and maintainability. 证据：`mm_agents/maestro/utils/README.md`
- **Pointer Agent Integration**（documentation）：1. Setup Environment : You need to create a virutalenv and install requirements: 证据：`mm_agents/pointer/README.md`
- **Qwen Agents**（documentation）：This package contains OpenAI-compatible Qwen vision agents for OSWorld. 证据：`mm_agents/qwen/README.md`
- **Surfer H — OSWorld Benchmark Runner**（documentation）：Surfer H — OSWorld Benchmark Runner 证据：`mm_agents/surferH/README.md`
- **UiPath Screen Agent**（documentation）：23 Dec 2025 - Updated the planner model to Claude 4.5 Opus https://www.anthropic.com/news/claude-opus-4-5 - Updated the grounder model to an internally finetuned version of Qwen3-VL https://github.com/QwenLM/Qwen3-VL and allowing it to predict "refusal" similar to OSWorld-G for elements that do not exist - Added memory for storing relevant information across steps - Improved utilization of the UI element detector for fine grained details such as cell corners - Refactoring and various small fixes 证据：`mm_agents/uipath/README.md`
- **VLAA-GUI**（documentation）：VLAA-GUI is a multi-agent system for desktop GUI automation. VLAA-GUI decomposes complex desktop tasks into perception, planning, and action through a pipeline of specialized sub-agents. 证据：`mm_agents/vlaa_gui/README.md`
- **License**（source_file）：Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ 证据：`LICENSE`
- **OSWorld Setup and Evaluation Guide**（documentation）：This comprehensive guide covers all aspects of setting up and running OSWorld evaluations, including account configuration, proxy setup, and public evaluation platform deployment. 证据：`SETUP_GUIDELINE.md`
- **Fallback to environment-based config if no configs found**（source_file）：TASK STATUS CACHE = {} DONE STABILITY PERIOD = int os.getenv "DONE STABILITY PERIOD", "30" app = Flask name MONITOR IN DOCKER = os.getenv "MONITOR IN DOCKER", "false" .lower == "true" ⋮---- TASK CONFIG PATH = "/app/evaluation examples/test.json" EXAMPLES BASE PATH = "/app/evaluation examples/examples" RESULTS BASE PATH = "/app/results" ⋮---- TASK CONFIG PATH = os.getenv "TASK CONFIG PATH", "../evaluation examples/test.json" EXAMPLES BASE PATH = os.getenv "EXAMPLES BASE PATH", "../evaluation examples/examples" RESULTS BASE PATH = os.getenv "RESULTS BASE PATH", "../results" MAX STEPS = int os.getenv "MAX STEPS", "150" ⋮---- @cache def get default config ⋮---- action space path = os.path.join… 证据：`monitor/main.py`
- **Restore original signal handlers**（source_file）：logger = logging.getLogger "desktopenv.providers.aliyun.AliyunVMManager" ⋮---- ALIYUN INSTANCE TYPE = os.getenv "ALIYUN INSTANCE TYPE" ALIYUN ACCESS KEY ID = os.getenv "ALIYUN ACCESS KEY ID" ALIYUN ACCESS KEY SECRET = os.getenv "ALIYUN ACCESS KEY SECRET" ALIYUN REGION = os.getenv "ALIYUN REGION" ALIYUN IMAGE ID = os.getenv "ALIYUN IMAGE ID" ALIYUN SECURITY GROUP ID = os.getenv "ALIYUN SECURITY GROUP ID" ALIYUN VSWITCH ID = os.getenv "ALIYUN VSWITCH ID" ALIYUN RESOURCE GROUP ID = os.getenv "ALIYUN RESOURCE GROUP ID" WAIT DELAY = 20 MAX ATTEMPTS = 15 def allocate vm screen size= 1920, 1080 ⋮---- config = open api models.Config client = ECSClient config instance id = None original sigint handl… 证据：`desktop_env/providers/aliyun/manager.py`
- **Wait until the instance reaches 'Running' state**（source_file）：logger = logging.getLogger "desktopenv.providers.aliyun.AliyunProvider" ⋮---- class AliyunProvider Provider ⋮---- def init self, kwargs ⋮---- env use private = os.getenv "ALIYUN USE PRIVATE IP", "1" .lower in {"1", "true", "yes", "on"} kw flag = kwargs.get "use private ip", None ⋮---- def create client self - ECSClient ⋮---- config = open api models.Config ⋮---- def start emulator self, path to vm: str, headless: bool, args, kwargs ⋮---- response = self. describe instance path to vm ⋮---- instance = response.body.instances.instance 0 state = instance.status ⋮---- req = ecs models.StartInstanceRequest instance id=path to vm ⋮---- Wait until the instance reaches 'Running' state ⋮---- For all… 证据：`desktop_env/providers/aliyun/provider.py`
- **Restore original signal handlers**（source_file）：INSTANCE TYPE = "t3.xlarge" ⋮---- PROXY SUPPORT AVAILABLE = True ⋮---- PROXY SUPPORT AVAILABLE = False logger = logging.getLogger "desktopenv.providers.aws.AWSVMManager" ⋮---- DEFAULT REGION = "us-east-1" IMAGE ID MAP = { def allocate vm region=DEFAULT REGION, screen size= 1920, 1080 ⋮---- ami id = IMAGE ID MAP region screen size ec2 client = boto3.client 'ec2', region name=region instance id = None original sigint handler = signal.getsignal signal.SIGINT original sigterm handler = signal.getsignal signal.SIGTERM def signal handler sig, frame ⋮---- signal name = "SIGINT" if sig == signal.SIGINT else "SIGTERM" ⋮---- Restore original signal handlers ⋮---- Raise appropriate exception based on… 证据：`desktop_env/providers/aws/manager.py`
- **If the instance is already running, skip starting it**（source_file）：logger = logging.getLogger "desktopenv.providers.aws.AWSProvider" ⋮---- WAIT DELAY = 15 MAX ATTEMPTS = 10 class AWSProvider Provider ⋮---- def start emulator self, path to vm: str, headless: bool, args, kwargs ⋮---- ec2 client = boto3.client 'ec2', region name=self.region ⋮---- response = ec2 client.describe instances InstanceIds= path to vm state = response 'Reservations' 0 'Instances' 0 'State' 'Name' ⋮---- If the instance is already running, skip starting it ⋮---- Start the instance if it's currently stopped ⋮---- Wait until the instance reaches 'running' state waiter = ec2 client.get waiter 'instance running' ⋮---- For all other states terminated, pending, etc. , log a warning ⋮---- def… 证据：`desktop_env/providers/aws/provider.py`
- **Manager**（source_file）：logger = logging.getLogger "desktopenv.providers.azure.AzureVMManager" ⋮---- REGISTRY PATH = '.azure vms' def allocate vm region class AzureVMManager VMManager ⋮---- def init self, registry path=REGISTRY PATH def initialize registry self def add vm self, vm path, region ⋮---- lines = file.readlines vm path at vm region = "{}@{}".format vm path, region new lines = lines + f'{vm path at vm region} free\n' ⋮---- def occupy vm self, vm path, pid, region ⋮---- new lines = ⋮---- def check and clean self def list free vms self, region ⋮---- free vms = ⋮---- def get vm path self, region, screen size= 1920, 1080 , kwargs ⋮---- free vms paths = self.list free vms region ⋮---- new vm path = allocate v… 证据：`desktop_env/providers/azure/manager.py`
- **Wait for the instance to start**（source_file）：logger = logging.getLogger "desktopenv.providers.azure.AzureProvider" ⋮---- WAIT DELAY = 15 MAX ATTEMPTS = 10 class AzureProvider Provider ⋮---- def init self, region: str = None ⋮---- credential = DefaultAzureCredential ⋮---- def start emulator self, path to vm: str, headless: bool, os type: str = None, args, kwargs ⋮---- vm = self.compute client.virtual machines.get resource group name, vm name, expand='instanceView' power state = vm.instance view.statuses -1 .code ⋮---- async vm start = self.compute client.virtual machines.begin start resource group name, vm name ⋮---- Wait for the instance to start ⋮---- def get ip address self, path to vm: str - str ⋮---- vm = self.compute client.virtu… 证据：`desktop_env/providers/azure/provider.py`
- **Base**（source_file）：class Provider ABC ⋮---- def init self, region: str = None ⋮---- @abstractmethod def start emulator self, path to vm: str, headless: bool ⋮---- @abstractmethod def get ip address self, path to vm: str - str ⋮---- @abstractmethod def save state self, path to vm: str, snapshot name: str ⋮---- @abstractmethod def revert to snapshot self, path to vm: str, snapshot name: str - str ⋮---- @abstractmethod def stop emulator self, path to vm: str class VMManager ABC ⋮---- checked and cleaned = False ⋮---- @abstractmethod def initialize registry self, kwargs ⋮---- @abstractmethod def add vm self, vm path, kwargs ⋮---- @abstractmethod def delete vm self, vm path, kwargs ⋮---- @abstractmethod def occupy… 证据：`desktop_env/providers/base.py`
- **This means the range was not satisfiable, possibly the file was fully downloaded**（source_file）：logger = logging.getLogger "desktopenv.providers.docker.DockerVMManager" ⋮---- MAX RETRY TIMES = 10 RETRY INTERVAL = 5 UBUNTU X86 URL = "https://huggingface.co/datasets/xlangai/ubuntu osworld/resolve/main/Ubuntu.qcow2.zip" WINDOWS X86 URL = "https://huggingface.co/datasets/xlangai/windows osworld/resolve/main/Windows-10-x64.qcow2.zip" VMS DIR = "./docker vm data" URL = UBUNTU X86 URL DOWNLOADED FILE NAME = URL.split '/' -1 ⋮---- docker path = r"C:\Program Files\Docker\Docker" ⋮---- def download vm vms dir: str ⋮---- downloaded size = 0 hf endpoint = os.environ.get 'HF ENDPOINT' ⋮---- URL = URL.replace 'huggingface.co', 'hf-mirror.com' ⋮---- downloaded file name = DOWNLOADED FILE NAME ⋮----… 证据：`desktop_env/providers/docker/manager.py`
- **Wait for VM to be ready**（source_file）：logger = logging.getLogger "desktopenv.providers.docker.DockerProvider" ⋮---- WAIT TIME = 3 RETRY INTERVAL = 1 LOCK TIMEOUT = 10 class PortAllocationError Exception class DockerProvider Provider ⋮---- def init self, region: str ⋮---- temp dir = Path os.getenv 'TEMP' if platform.system == 'Windows' else '/tmp' ⋮---- def get used ports self ⋮---- system ports = set conn.laddr.port for conn in psutil.net connections docker ports = set ⋮---- ports = container.attrs 'NetworkSettings' 'Ports' ⋮---- def get available port self, start port: int - int ⋮---- used ports = self. get used ports port = start port ⋮---- def wait for vm ready self, timeout: int = 300 ⋮---- """Wait for VM to be ready by che… 证据：`desktop_env/providers/docker/provider.py`
- **Manager**（source_file）：logger = logging.getLogger "desktopenv.providers.fastvm.FastvmVMManager" ⋮---- def osworld firewall - dict def allocate vm snapshot id: Optional str , machine type: str - str ⋮---- client = get client vm id: Optional str = None prev int = signal.getsignal signal.SIGINT prev term = signal.getsignal signal.SIGTERM def handler sig, frame ⋮---- name = "SIGINT" if sig == signal.SIGINT else "SIGTERM" ⋮---- except Exception as cleanup err: noqa: BLE001 ⋮---- launch kwargs: dict str, Any = { ⋮---- vm = client.vms.launch launch kwargs vm id = vm.id ⋮---- class FastvmVMManager VMManager ⋮---- """FastVM has no persistent VM pool — everything is allocate-on-demand.""" def init self, kwargs: Any - None… 证据：`desktop_env/providers/fastvm/manager.py`
- **------------------------------------------------------------------**（source_file）：logger = logging.getLogger "desktopenv.providers.fastvm.FastvmProvider" ⋮---- RUNNING = "running" TERMINAL FAILURE = {"error", "stopped", "deleting"} def osworld firewall - dict def bracket host: str - str def format vm ip ports host: str - str ⋮---- """Emit the host:server:chromium:vnc:vlc tuple DesktopEnv expects.""" ⋮---- def wait for server host: str, timeout: float - None ⋮---- """Poll the in-VM Flask server until /screenshot returns 200. Raises TimeoutError on budget exhaustion so the caller can clean up. """ url = f"http://{ bracket host }:{SERVER PORT}/screenshot" deadline = time.monotonic + timeout last err: Optional str = None ⋮---- r = requests.get url, timeout= 5, 5 ⋮---- last e… 证据：`desktop_env/providers/fastvm/provider.py`
- **Download the virtual machine image**（source_file）：logger = logging.getLogger "desktopenv.providers.virtualbox.VirtualBoxVMManager" ⋮---- MAX RETRY TIMES = 10 RETRY INTERVAL = 5 UBUNTU ARM URL = "NOT AVAILABLE" UBUNTU X86 URL = "https://huggingface.co/datasets/xlangai/ubuntu x86 virtualbox/resolve/main/Ubuntu.zip" DOWNLOADED FILE NAME = "Ubuntu.zip" REGISTRY PATH = '.virtualbox vms' LOCK FILE NAME = '.virtualbox lck' VMS DIR = "./virtualbox vm data" update lock = threading.Lock ⋮---- vboxmanage path = r"C:\Program Files\Oracle\VirtualBox" ⋮---- def generate new vm name vms dir, os type ⋮---- registry idx = 0 ⋮---- attempted new name = f"{os type}{registry idx}" ⋮---- def install vm vm name, vms dir, downloaded file name, original vm name="U… 证据：`desktop_env/providers/virtualbox/manager.py`
- **Note: os type parameter is ignored for VirtualBox provider**（source_file）：logger = logging.getLogger "desktopenv.providers.virtualbox.VirtualBoxProvider" ⋮---- WAIT TIME = 3 class VirtualBoxProvider Provider ⋮---- @staticmethod def execute command command: list ⋮---- result = subprocess.run command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=60, text=True, ⋮---- @staticmethod def get vm uuid path to vm: str ⋮---- output = subprocess.check output f"VBoxManage list vms", shell=True, stderr=subprocess.STDOUT output = output.decode output = output.splitlines ⋮---- tree = ET.parse path to vm root = tree.getroot machine element = root.find './/{http://www.virtualbox.org/}Machine' ⋮---- uuid = machine element.get 'uuid' 1:-1 ⋮---- uuid = line.split 1 1:-1 ⋮… 证据：`desktop_env/providers/virtualbox/provider.py`
- **Write the updated content back to the file**（source_file）：logger = logging.getLogger "desktopenv.providers.vmware.VMwareVMManager" ⋮---- MAX RETRY TIMES = 10 RETRY INTERVAL = 5 UBUNTU ARM URL = "https://huggingface.co/datasets/xlangai/ubuntu osworld/resolve/main/Ubuntu-arm.zip" UBUNTU X86 URL = "https://huggingface.co/datasets/xlangai/ubuntu osworld/resolve/main/Ubuntu-x86.zip" WINDOWS X86 URL = "https://huggingface.co/datasets/xlangai/windows osworld/resolve/main/Windows-x86.zip" ⋮---- URL = UBUNTU X86 URL ⋮---- URL = UBUNTU ARM URL ⋮---- DOWNLOADED FILE NAME = URL.split '/' -1 REGISTRY PATH = '.vmware vms' LOCK FILE NAME = '.vmware lck' VMS DIR = "./vmware vm data" update lock = threading.Lock ⋮---- vboxmanage path = r"C:\Program Files x86 \VMwa… 证据：`desktop_env/providers/vmware/manager.py`
- **Provider**（source_file）：logger = logging.getLogger "desktopenv.providers.vmware.VMwareProvider" ⋮---- WAIT TIME = 3 def get vmrun type return list=False class VMwareProvider Provider ⋮---- @staticmethod def execute command command: list, return output=False ⋮---- process = subprocess.Popen ⋮---- output = process.communicate 0 .strip ⋮---- def start emulator self, path to vm: str, headless: bool, os type: str ⋮---- output = subprocess.check output f"vmrun {get vmrun type } list", shell=True, stderr=subprocess.STDOUT output = output.decode output = output.splitlines normalized path to vm = os.path.abspath os.path.normpath path to vm ⋮---- command = "vmrun" + get vmrun type return list=True + "start", path to vm ⋮---… 证据：`desktop_env/providers/vmware/provider.py`
- **Restore original signal handlers**（source_file）：logger = logging.getLogger "desktopenv.providers.volcengine.VolcengineVMManager" ⋮---- VOLCENGINE ACCESS KEY ID = os.getenv "VOLCENGINE ACCESS KEY ID" VOLCENGINE SECRET ACCESS KEY = os.getenv "VOLCENGINE SECRET ACCESS KEY" VOLCENGINE REGION = os.getenv "VOLCENGINE REGION" VOLCENGINE SUBNET ID = os.getenv "VOLCENGINE SUBNET ID" VOLCENGINE SECURITY GROUP ID = os.getenv "VOLCENGINE SECURITY GROUP ID" VOLCENGINE INSTANCE TYPE = os.getenv "VOLCENGINE INSTANCE TYPE" VOLCENGINE IMAGE ID = os.getenv "VOLCENGINE IMAGE ID" VOLCENGINE ZONE ID = os.getenv "VOLCENGINE ZONE ID" VOLCENGINE DEFAULT PASSWORD = os.getenv "VOLCENGINE DEFAULT PASSWORD" def allocate vm screen size= 1920, 1080 ⋮---- configuratio… 证据：`desktop_env/providers/volcengine/manager.py`
- **启动实例**（source_file）：logger = logging.getLogger "desktopenv.providers.volcengine.VolcengineProvider" ⋮---- WAIT DELAY = 15 MAX ATTEMPTS = 10 class VolcengineProvider Provider ⋮---- def init self, kwargs def create client self - ECSApi ⋮---- configuration = volcenginesdkcore.Configuration ⋮---- def start emulator self, path to vm: str, headless: bool, args, kwargs ⋮---- instance info = self.client.describe instances ecs models.DescribeInstancesRequest status = instance info.instances 0 .status ⋮---- 启动实例 ⋮---- 等待实例运行 ⋮---- def get ip address self, path to vm: str - str ⋮---- public ip = instance info.instances 0 .eip address.ip address private ip = instance info.instances 0 .network interfaces 0 .primary ip addr… 证据：`desktop_env/providers/volcengine/provider.py`
- **Expand user directory**（source_file）：platform name: str = platform.system ⋮---- BaseWrapper = Any ⋮---- Accessible = Any ⋮---- Accessible = None ⋮---- app = Flask name ⋮---- TIMEOUT = 1800 logger = app.logger recording process = None recording path = "/tmp/recording.mp4" ⋮---- @app.route '/setup/execute', methods= 'POST' @app.route '/execute', methods= 'POST' def execute command ⋮---- data = request.json shell = data.get 'shell', False command = data.get 'command', "" if shell else ⋮---- command = shlex.split command Expand user directory ⋮---- Execute the command without any safety checks. ⋮---- flags = subprocess.CREATE NO WINDOW ⋮---- flags = 0 result = subprocess.run ⋮---- @app.route '/setup/execute with verification', met… 证据：`desktop_env/server/main.py`
- **Handle mouse move and drag actions**（source_file）：logger = logging.getLogger "desktopenv.agent" API RETRY TIMES = 500 API RETRY INTERVAL = 5 class AnthropicAgent ⋮---- def get sampling params self ⋮---- params = {} ⋮---- def add tool result self, tool call id: str, result: str, screenshot: bytes = None ⋮---- tool result content = ⋮---- screenshot base64 = base64.b64encode screenshot .decode 'utf-8' ⋮---- def extract raw response string self, response - str ⋮---- raw response str = "" ⋮---- def parse actions from tool call self, tool call: Dict - str ⋮---- result = "" function args = batched actions = function args.get "actions" ⋮---- action = function args.get "action" ⋮---- action = tool call.function.name action conversion = { action = a… 证据：`mm_agents/anthropic/main.py`
- **Base**（source_file）：class BaseAnthropicTool metaclass=ABCMeta ⋮---- @abstractmethod def call self, kwargs - Any ⋮---- @dataclass frozen=True class ToolResult ⋮---- output: Optional str = None error: Optional str = None base64 image: Optional str = None system: Optional str = None def bool self def add self, other: "ToolResult" def replace self, kwargs class CLIResult ToolResult class ToolFailure ToolResult class ToolError Exception ⋮---- def init self, message 证据：`mm_agents/anthropic/tools/base.py`
- **Find all non-overlapping matches in the string**（source_file）：logger = logging.getLogger "desktopenv.agent" pure text settings = "a11y tree" def parse code from string input string ⋮---- pattern = r" " Find all non-overlapping matches in the string matches = re.findall pattern, input string, re.DOTALL The regex above captures the content inside the triple backticks. The re.DOTALL flag allows the dot . to match newline characters as well, so the code inside backticks can span multiple lines. matches now contains all the captured code snippets codes = ⋮---- match = match.strip commands = "WAIT", "DONE", "FAIL" ⋮---- class AutoGLMAgent ⋮---- @property def turn number self def prepare self, instruction: str, obs: Dict, history: List, last result: str = ""… 证据：`mm_agents/autoglm/main.py`
- **Find all non-overlapping matches in the string**（source_file）：logger = logging.getLogger "desktopenv.agent" pure text settings = "a11y tree" def resize image image, w, h ⋮---- img = Image.open BytesIO image img = img.resize w, h buf = BytesIO ⋮---- img bytes = buf.getvalue ⋮---- def parse code from string input string ⋮---- pattern = r" " Find all non-overlapping matches in the string matches = re.findall pattern, input string, re.DOTALL The regex above captures the content inside the triple backticks. The re.DOTALL flag allows the dot . to match newline characters as well, so the code inside backticks can span multiple lines. matches now contains all the captured code snippets codes = ⋮---- match = match.strip commands = "WAIT", "DONE", "FAIL" ⋮----… 证据：`mm_agents/autoglm_v/main.py`
- **Configure text grounding agent**（source_file）：class ACI ⋮---- def init self def agent action func UBUNTU APP SETUP = f"""import subprocess; SET CELL VALUES CMD = """import uno class OSWorldACI ACI ⋮---- llm config = AgentConfig ⋮---- Configure text grounding agent ⋮---- Given the state and worker's referring expression, use the grounding model to generate x,y def generate coords self, ref expr: str, obs: Dict - List int ⋮---- Reset the grounding model state self.grounding model.reset Configure the context, UI-TARS demo does not use system prompt prompt = f"Query:{ref expr}\nOutput only the coordinate of one point in your response.\n" grounding message = { response = call llm model ⋮---- numericals = re.findall r"\d+", response ⋮---- Ca… 证据：`mm_agents/aworldguiagent/grounding.py`
- **Agent Capability**（source_file）：class AgentCapability ⋮---- def init self def add to agent self, agent: ConversableAgent 证据：`mm_agents/coact/autogen/agentchat/contrib/capabilities/agent_capability.py`
- **Tools Capability**（source_file）：class ToolsCapability ⋮---- def init self, tool list: list Tool def add to agent self, agent: ConversableAgent 证据：`mm_agents/coact/autogen/agentchat/contrib/capabilities/tools_capability.py`
- **Transforms**（source_file）：class MessageTransform Protocol ⋮---- def apply transform self, messages: list dict str, Any - list dict str, Any ⋮---- class MessageHistoryLimiter ⋮---- exclude names = getattr self, " exclude names", None filtered = msg for msg in messages if msg.get "name" not in exclude names if exclude names else messages ⋮---- truncated messages = remaining count = self. max messages ⋮---- truncated messages = filtered 0 ⋮---- pre transform messages len = len pre transform messages post transform messages len = len post transform messages ⋮---- logs str = ⋮---- def validate max messages self, max messages: Optional int class MessageTokenLimiter ⋮---- temp messages = copy.deepcopy messages processed me… 证据：`mm_agents/coact/autogen/agentchat/contrib/capabilities/transforms.py`
- **Base**（source_file）：all = "CodeBlock", "CodeExecutionConfig", "CodeExecutor", "CodeExtractor", "CodeResult" ⋮---- @export module "autogen.coding" class CodeBlock BaseModel ⋮---- code: str = Field description="The code to execute." language: str = Field description="The language of the code." ⋮---- @export module "autogen.coding" class CodeResult BaseModel ⋮---- exit code: int = Field description="The exit code of the code execution." output: str = Field description="The output of the code execution." ⋮---- @export module "autogen.coding" class CodeExtractor Protocol ⋮---- @runtime checkable @export module "autogen.coding" class CodeExecutor Protocol ⋮---- @property def code extractor self - CodeExtractor def e… 证据：`mm_agents/coact/autogen/coding/base.py`
- **Base**（source_file）：@dataclass @export module "autogen.coding.jupyter" class JupyterConnectionInfo ⋮---- host: str use https: bool port: Optional int = None token: Optional str = None ⋮---- @runtime checkable @export module "autogen.coding.jupyter" class JupyterConnectable Protocol ⋮---- @property def connection info self - JupyterConnectionInfo 证据：`mm_agents/coact/autogen/coding/jupyter/base.py`
- **Provider**（source_file）：class Provider ⋮---- dependency overrides: Dict Callable ..., Any , Callable ..., Any def init self - None def clear self - None ⋮---- dependency provider = Provider 证据：`mm_agents/coact/autogen/fast_depends/dependencies/provider.py`
- **Base**（source_file）：all = "IOStream", "InputStream", "OutputStream" logger = logging.getLogger name ⋮---- @runtime checkable @export module "autogen.io" class OutputStream Protocol ⋮---- def print self, objects: Any, sep: str = " ", end: str = "\n", flush: bool = False - None def send self, message: BaseEvent - None ⋮---- @runtime checkable @export module "autogen.io" class InputStream Protocol ⋮---- def input self, prompt: str = "", , password: bool = False - str ⋮---- """Read a line from the input stream. Args: prompt str, optional : The prompt to display. Defaults to "". password bool, optional : Whether to read a password. Defaults to False. Returns: str: The line read from the input stream. """ ... pragma… 证据：`mm_agents/coact/autogen/io/base.py`
- **Base**（source_file）：all = "AsyncEventProcessorProtocol", "EventProcessorProtocol" ⋮---- @export module "autogen.io" class EventProcessorProtocol Protocol ⋮---- def process self, response: "RunResponseProtocol" - None: ... ⋮---- @export module "autogen.io" class AsyncEventProcessorProtocol Protocol ⋮---- async def process self, response: "AsyncRunResponseProtocol" - None: ... 证据：`mm_agents/coact/autogen/io/processors/base.py`
- **Credentials Hosted Provider**（source_file）：all = "GoogleCredenentialsHostedProvider" ⋮---- @export module "autogen.tools.experimental.google.authentication" class GoogleCredenentialsHostedProvider GoogleCredentialsProvider ⋮---- @property def host self - str ⋮---- @property def port self - int def get credentials self - "Credentials" 证据：`mm_agents/coact/autogen/tools/experimental/google/authentication/credentials_hosted_provider.py`
- **Credentials Local Provider**（source_file）：all = "GoogleCredentialsLocalProvider" ⋮---- @export module "autogen.tools.experimental.google.authentication" class GoogleCredentialsLocalProvider GoogleCredentialsProvider ⋮---- @property def host self - str ⋮---- @property def port self - int ⋮---- def refresh or get new credentials self, creds: Optional "Credentials" - "Credentials" ⋮---- flow = InstalledAppFlow.from client secrets file self.client secret file, self.scopes creds = flow.run local server host=self.host, port=self.port ⋮---- def get credentials self - "Credentials" ⋮---- creds = None ⋮---- creds = Credentials.from authorized user file self.token file ⋮---- creds = self. refresh or get new credentials creds 证据：`mm_agents/coact/autogen/tools/experimental/google/authentication/credentials_local_provider.py`
- **Credentials Provider**（source_file）：all = "GoogleCredentialsProvider" ⋮---- @runtime checkable @export module "autogen.tools.experimental.google.authentication" class GoogleCredentialsProvider Protocol ⋮---- def get credentials self - Optional "Credentials" ⋮---- @property def host self - str ⋮---- @property def port self - int 证据：`mm_agents/coact/autogen/tools/experimental/google/authentication/credentials_provider.py`
- **Build tool dictionary, preserving all configuration fields**（source_file）：logger = logging.getLogger name class ConfigManager ⋮---- def load tools configuration self - Dict str, Any ⋮---- tools config path = os.path.join ⋮---- Build tool dictionary, preserving all configuration fields ⋮---- tool name = tool "tool name" ⋮---- def setup knowledge base self, platform: str - str ⋮---- """Initialize agent's knowledge base path and check if it exists""" ⋮---- Initialize agent's knowledge base path local kb path = os.path.join self.memory root path, self.memory folder name Check if knowledge base exists kb platform path = os.path.join local kb path, platform ⋮---- def get tools dict self - Dict str, Any ⋮---- """Get tools dictionary""" ⋮---- def get tools config self -… 证据：`mm_agents/maestro/maestro/controller/config_manager.py`
- **若提供 element description 并已在 assign coordinates 中得到 coords1，则下发坐标**（source_file）：logger = logging.getLogger "desktopenv.agent" class ACI ⋮---- def init self def agent action func class Grounding ACI ⋮---- def generate coords self, ref expr: str, obs: Dict - List int ⋮---- grounding start time = time.time ⋮---- prompt = ⋮---- grounding end time = time.time grounding duration = grounding end time - grounding start time ⋮---- numericals = re.findall r"\d+", response ⋮---- def assign coordinates self, plan: str, obs: Dict ⋮---- action = parse single code from string function name = re.match r" \w+\.\w+ \ ", ⋮---- action .group 1 type: ignore args = self.parse function args action ⋮---- def reset screen size self, width: int, height: int def resize coordinates self, coordina… 证据：`mm_agents/maestro/maestro/grounding.py`
- **New Manager**（source_file）：logger = logging.getLogger name class NewManager ⋮---- def initialize tools self def initialize knowledge base self ⋮---- kb tools dict = { ⋮---- def initialize handlers self def plan task self, scenario: Union PlanningScenario, str - PlanningResult ⋮---- scenario enum = self. normalize scenario scenario ⋮---- current trigger code = self. get current trigger code ⋮---- """Normalize string/enum scenario to PlanningScenario enum case-insensitive .""" ⋮---- s = str scenario .strip .lower ⋮---- def handle planning scenario self, scenario: PlanningScenario, trigger code: str = "controller" - PlanningResult def handle supplement scenario self - PlanningResult ⋮---- result = self.supplement handle… 证据：`mm_agents/maestro/maestro/new_manager.py`
- **GENERATE ACTION or others: keep raw action**（source_file）：logger = logging.getLogger name class NewWorker ⋮---- def normalize action for outcome self, outcome: str, raw action: Optional Dict str, Any , message: str - Dict str, Any ⋮---- action: Dict str, Any = {} ⋮---- action = {"type": "Stale"} ⋮---- GENERATE ACTION or others: keep raw action ⋮---- def process subtask and create command self - Optional str ⋮---- """Route to the right role, create command/decision if applicable, and return worker decision string. Returns one of WorkerDecision values or None on no-op/error. """ subtask id = self. global state.get task .current subtask id subtask = self. global state.get subtask subtask id type: ignore ⋮---- current trigger code = self. get current… 证据：`mm_agents/maestro/maestro/new_worker.py`
- 其余 17 条证据见 `AI_CONTEXT_PACK.json` 或 `EVIDENCE_INDEX.json`。

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `evaluation_examples/README.md`, `mm_agents/README.md`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `evaluation_examples/README.md`, `mm_agents/README.md`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **Overview and System Architecture**：importance `high`
  - source_paths: README.md, desktop_env/desktop_env.py, desktop_env/__init__.py, desktop_env/actions.py, run.py
- **Environment Providers and Deployment**：importance `high`
  - source_paths: desktop_env/providers/base.py, desktop_env/providers/vmware/provider.py, desktop_env/providers/vmware/manager.py, desktop_env/providers/virtualbox/provider.py, desktop_env/providers/docker/provider.py
- **Agent Framework and Baselines**：importance `high`
  - source_paths: mm_agents/README.md, mm_agents/agent.py, mm_agents/prompts.py, mm_agents/anthropic/main.py, mm_agents/qwen/main.py
- **Evaluators, Benchmark Tasks, and Known Issues**：importance `high`
  - source_paths: desktop_env/evaluators/__init__.py, desktop_env/evaluators/README.md, desktop_env/evaluators/getters/chrome.py, desktop_env/evaluators/getters/gimp.py, desktop_env/evaluators/getters/vscode.py

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `fe8c78e15a1149e82d54137e9ffef18aee710ed7`
- inspected_files: `requirements.txt`, `uv.lock`, `pyproject.toml`, `README.md`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 来源证据：Guest VM shows a Snap Store "software updates available" popup on reset, derailing screenshot agents

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Guest VM shows a Snap Store "software updates available" popup on reset, derailing screenshot agents
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/xlang-ai/OSWorld/issues/515 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 来源证据：Proposal: trace diagnostics for computer-use agent failures

- Trigger: GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：Proposal: trace diagnostics for computer-use agent failures
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/xlang-ai/OSWorld/issues/514 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 来源证据：Container starts but Chrome DevTools port returns 400, even with clean happysixd/osworld-docker image and verified proxy

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Container starts but Chrome DevTools port returns 400, even with clean happysixd/osworld-docker image and verified proxy
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/xlang-ai/OSWorld/issues/495 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 来源证据：Feasible-task evaluators return reward=1 without verifying the task was done (loose substring matching, no causation/de…

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Feasible-task evaluators return reward=1 without verifying the task was done (loose substring matching, no causation/delta check)
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | https://github.com/xlang-ai/OSWorld/issues/518 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 来源证据：Pixel-blind CLI agent scores 77.9% on OSWorld test_all (vs 64.3% vision) — sharing a CLI baseline + intent-aware judge

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Pixel-blind CLI agent scores 77.9% on OSWorld test_all (vs 64.3% vision) — sharing a CLI baseline + intent-aware judge
- Why it matters: 可能阻塞安装或首次运行。
- Evidence: community_evidence:github | https://github.com/xlang-ai/OSWorld/issues/517 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 6: 能力判断依赖假设

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: 将假设转成下游验证清单。
- Why it matters: 假设不成立时，用户拿不到承诺的能力。
- Evidence: capability.assumptions | github_repo:705433049 | https://github.com/xlang-ai/OSWorld | README/documentation is current enough for a first validation pass.
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 维护活跃度未知

- Trigger: 未记录 last_activity_observed。
- Host AI rule: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Why it matters: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Evidence: evidence.maintainer_signals | github_repo:705433049 | https://github.com/xlang-ai/OSWorld | last_activity_observed missing
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

- Trigger: no_demo
- Evidence: downstream_validation.risk_items | github_repo:705433049 | https://github.com/xlang-ai/OSWorld | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 9: 存在评分风险

- Trigger: no_demo
- Why it matters: 风险会影响是否适合普通用户安装。
- Evidence: risks.scoring_risks | github_repo:705433049 | https://github.com/xlang-ai/OSWorld | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 10: issue/PR 响应质量未知

- Trigger: issue_or_pr_quality=unknown。
- Host AI rule: 抽样最近 issue/PR，判断是否长期无人处理。
- Why it matters: 用户无法判断遇到问题后是否有人维护。
- Evidence: evidence.maintainer_signals | github_repo:705433049 | https://github.com/xlang-ai/OSWorld | issue_or_pr_quality=unknown
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。