five

AgentNet

收藏
魔搭社区2026-01-09 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/xlangai/AgentNet
下载链接
链接失效反馈
官方服务:
资源简介:
<h1 style=" font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Helvetica,Arial,sans-serif; font-size:48px; font-weight:700; line-height:1.25; text-align:center; margin:0 0 24px;"> OpenCUA: Open Foundations for Computer-Use Agents </h1> <div style=" display:flex; justify-content:center; gap:12px; flex-wrap:wrap; margin-bottom:28px;"> <a href="https://opencua.xlang.ai/" style=" display:inline-block; padding:8px 24px; background:#2b2b2b; color:#ffffff; border-radius:36px; text-decoration:none; font-weight:600; font-size:16px;"> 🌐 Website </a> <a href="https://agentnet_data_viewer.xlang.ai/" style=" display:inline-block; padding:8px 24px; background:#2b2b2b; color:#ffffff; border-radius:36px; text-decoration:none; font-weight:600; font-size:16px;"> 🔎 Data Viewer </a> <a href="https://arxiv.org/abs/2508.09123" style=" display:inline-block; padding:8px 24px; background:#2b2b2b; color:#ffffff; border-radius:36px; text-decoration:none; font-weight:600; font-size:16px;"> 📝 Paper </a> <a href="https://github.com/xlang-ai/OpenCUA" style=" display:inline-block; padding:8px 24px; background:#2b2b2b; color:#ffffff; border-radius:36px; text-decoration:none; font-weight:600; font-size:16px;"> 💻 Code </a> </div> <div style="max-width:900px;margin:0 auto;"> <div style="text-align:center;"> # AgentNet Dataset </div> AgentNet is the first large-scale desktop computer-use agent trajectory dataset, containing 22.6K human-annotated computer-use tasks across Windows, macOS, and Ubuntu systems. ## Applications This dataset enables training and evaluation of: - Vision-language-action (VLA) models for computer use - Multi-modal agents for desktop automation - GUI understanding and interaction systems - Cross-platform computer-use agents ## 🚀 Quick Start Download the dataset here: ``` pip install -U huggingface_hub huggingface-cli download xlangai/AgentNet --repo-type dataset --local-dir ./AgentNet ``` Use the following command to unzip the file (For exmaple, Ubuntu data): ``` cd path_to_your_zip_files # Merge all the zips zip -s 0 images.zip --out images-full.zip # Unzip unzip images-full.zip -d path_to_your_target_dir ``` ## Action Space The dataset uses PyAutoGUI actions and pre-defined agent related actions: <div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/FwA69rCLh81c-9CXaSE40.png" width="800" alt="AgentNet Action Space"> </div> ## Task Diversity <div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/L281_EvnpQCeK9qShpqZX.png" width="400" alt="AgentNet Domain Distribution"> </div> The dataset spans 4 main domains: Work (office tools, task management), Professional (creative design, development, data analysis, research), Daily (e-commerce, social media, entertainment), and System (configuration, web utilities). Tasks exhibit medium-high complexity with multi-application workflows, professional knowledge requirements, and uncommon feature usage. <div style="text-align:center;"> ## Data Synthesis Pipeline </div> Our data synthesis follows a 3-step process: 1. **Tool Annotation** ([AgentNetTool](https://agentnet-tool.xlang.ai/)): Cross-platform annotation tool for capturing screen recordings, mouse/keyboard signals, and accessibility trees 2. **Action Reduction & State-Action Matching** ([Code](https://github.com/xlang-ai/OpenCUA/tree/main/CoTGenerator)): Process raw demonstrations into compact state-action trajectories 3. **CoT Synthesis** ([Code](https://github.com/xlang-ai/OpenCUA/tree/main/CoTGenerator)): Generate structured reasoning (Observation, Thought, Action) for each step using reflective long CoT framework <div style="text-align:center;"> ## Data Structure </div> Each JSONL file contains trajectories in the following structure: ```json { "task_id": "20240927235321_5855063d-3f37-47a4-ab45-5247adfdb6f7", "instruction": "sort the table in ascending order based on the number column data in excel", "task_completed": false, "alignment_score": 7, "efficiency_score": 6, "task_difficulty": 3, "natural_language_task": "Could you help me sort this table in Excel...", "actual_task": "Sort a table in WPS Office...", "traj": [ { "index": 0, "image": "ea83c4aa-a4b1-48af-b439-0de7ee7b8d3f.png", "value": { "observation": "I'm looking at a WPS Office Excel spreadsheet...", "thought": "Since this is the first action...", "action": "Click on cell C2, which contains the number...", "code": "pyautogui.click(x=0.1632, y=0.2711)", "last_step_correct": true, "last_step_redundant": false, "reflection": "The action has successfully selected cell C2..." } } ] } ``` ## AgentNet Training Data Structure ### Data Components **Original Annotation:** - `instruction`: Human-annotated task description **Synthesized by Summarizer:** - `natural_language_task`: More natural task description - `actual_task`: Detailed task specification - `task_completed`, `alignment_score`, `efficiency_score`, `task_difficulty`: Task quality metrics **Trajectory Steps (`traj`):** Each step contains training components: - `observation`: Generated visual scene description - `thought`: Reasoning and planning process - `action`: Natural language action description - `code`: Executable PyAutoGUI/function code **Quality Control:** - `last_step_correct`: Whether current step is correct - `last_step_redundant`: Whether current step is redundant - `reflection`: Generated by reflector for error analysis ### Training Message Format During training, the data is converted into conversational format: ```python # System Prompt {"role": "system", "content": "You are a GUI agent..."} # Multi-image History (previous steps) {"role": "assistant", "content": "# Step 1 ## Action: Open Excel application "} {"role": "user", "image": "screenshot1.png"} {"role": "assistant", "content": "# Step 2 ## Action: Click on File menu "} {"role": "user", "image": "screenshot2.png"} {"role": "assistant", "content": "# Step 3 ## Action: Select data range "} # Current Step Instruction {"role": "user", "image": "current_screenshot.png"} {"role": "user", "content": "# Task Instruction: sort the table in ascending order... Please generate the next move..."} # Target Response (L2 CoT example. Loss only applied to this part.) {"role": "assistant", "content": "# Step 4 ## Thought: I need to select the data range first... ## Action: Click on cell C2 to select the number column ## Code: ```python pyautogui.click(x=0.1632, y=0.2711) ```"} ``` The training supports different CoT levels (L1: Action+Code, L2: Thought+Action+Code, L3: Observation+Thought+Action+Code) and action history. See **Appendix G** for detailed training data examples with complete system prompts and multi-turn conversations. <div style="text-align:center;"> ## Research and Commercial Use </div> OpenCUA (including the model, AgentNet dataset, tools, and code) may be used for **research, educational, and commercial purposes** under the **MIT License** (see `LICENSE`). ### Citation and Acknowledgement If you use **OpenCUA models** and/or the **AgentNet dataset** in any **report, technical report, publication, thesis, presentation, blog post, documentation, or other publicly shared material**, we **kindly ask** that you include an explicit acknowledgement in the main text and cite the OpenCUA paper. ### Prohibited Uses - The dataset may **not** be used for any purpose or activity that violates applicable laws or regulations in any jurisdiction - Use for illegal, unethical, or harmful activities is strictly prohibited - **Any unauthorized reproduction, distribution, or use that infringes upon intellectual property rights is strictly forbidden** - Users must respect privacy rights and confidentiality of any data subjects represented in the dataset ### Disclaimer - The authors, contributors, and copyright holders are **not responsible** for any illegal, unethical, or harmful use of the dataset, nor for any direct or indirect damages resulting from such use - Use of the "AgentNet" name, logo, or trademarks does **not** imply any endorsement or affiliation unless separate written permission is obtained - Users are solely responsible for ensuring their use complies with applicable laws and regulations - **Users assume all responsibility for verifying that their intended use does not violate any third-party rights or applicable laws** <div style="text-align:center;"> ## Citation </div> If you use OpenCUA in your research, please cite our work: ```bibtex @misc{wang2025opencuaopenfoundationscomputeruse, title={OpenCUA: Open Foundations for Computer-Use Agents}, author={Xinyuan Wang and Bowen Wang and Dunjie Lu and Junlin Yang and Tianbao Xie and Junli Wang and Jiaqi Deng and Xiaole Guo and Yiheng Xu and Chen Henry Wu and Zhennan Shen and Zhuokai Li and Ryan Li and Xiaochuan Li and Junda Chen and Boyuan Zheng and Peihang Li and Fangyu Lei and Ruisheng Cao and Yeqiao Fu and Dongchan Shin and Martin Shin and Jiarui Hu and Yuyan Wang and Jixuan Chen and Yuxiao Ye and Danyang Zhang and Dikang Du and Hao Hu and Huarong Chen and Zaida Zhou and Haotian Yao and Ziwei Chen and Qizheng Gu and Yipu Wang and Heng Wang and Diyi Yang and Victor Zhong and Flood Sung and Y. Charles and Zhilin Yang and Tao Yu}, year={2025}, eprint={2508.09123}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2508.09123}, } ``` </div>

# OpenCUA:计算机操作智能体开源基础 🌐 官网 🔎 数据查看器 📝 论文 💻 代码 ## AgentNet 数据集 AgentNet是首个大规模桌面计算机操作智能体轨迹数据集,涵盖Windows、macOS及Ubuntu三大系统下的22.6千条人工标注的计算机操作任务。 ## 应用场景 本数据集可用于训练与评估: - 面向计算机操作任务的视觉-语言-动作(Vision-Language-Action, VLA)模型 - 用于桌面自动化的多模态智能体 - 图形用户界面(Graphical User Interface, GUI)理解与交互系统 - 跨平台计算机操作智能体 ## 🚀 快速开始 可通过以下命令下载数据集: pip install -U huggingface_hub huggingface-cli download xlangai/AgentNet --repo-type dataset --local-dir ./AgentNet 使用以下命令解压文件(以Ubuntu数据集为例): cd path_to_your_zip_files # 合并所有分卷压缩包 zip -s 0 images.zip --out images-full.zip # 解压文件 unzip images-full.zip -d path_to_your_target_dir ## 动作空间 该数据集采用PyAutoGUI动作及预定义的智能体相关动作: <div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/FwA69rCLh81c-9CXaSE40.png" width="800" alt="AgentNet Action Space"> </div> ## 任务多样性 <div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/L281_EvnpQCeK9qShpqZX.png" width="400" alt="AgentNet Domain Distribution"> </div> 该数据集涵盖四大核心领域:办公(办公工具、任务管理)、专业创作(创意设计、开发、数据分析、学术研究)、日常休闲(电商、社交媒体、娱乐)及系统运维(配置管理、网页工具)。任务具备中高复杂度,涉及多应用工作流、专业知识需求及小众功能使用场景。 ## 数据合成流程 本数据集的合成遵循三步流程: 1. **工具标注**([AgentNetTool](https://agentnet-tool.xlang.ai/)):用于捕获屏幕录制、键鼠信号及可访问性树的跨平台标注工具 2. **动作精简与状态-动作匹配**([代码](https://github.com/xlang-ai/OpenCUA/tree/main/CoTGenerator)):将原始演示数据处理为紧凑的状态-动作轨迹 3. **思维链(Chain-of-Thought, CoT)合成**([代码](https://github.com/xlang-ai/OpenCUA/tree/main/CoTGenerator)):采用反思式长思维链框架,为每一步生成结构化推理内容(观察、思考、动作) ## 数据结构 每个JSONL文件包含如下结构的轨迹数据: json { "task_id": "20240927235321_5855063d-3f37-47a4-ab45-5247adfdb6f7", "instruction": "sort the table in ascending order based on the number column data in excel", "task_completed": false, "alignment_score": 7, "efficiency_score": 6, "task_difficulty": 3, "natural_language_task": "Could you help me sort this table in Excel...", "actual_task": "Sort a table in WPS Office...", "traj": [ { "index": 0, "image": "ea83c4aa-a4b1-48af-b439-0de7ee7b8d3f.png", "value": { "observation": "I'm looking at a WPS Office Excel spreadsheet...", "thought": "Since this is the first action...", "action": "Click on cell C2, which contains the number...", "code": "pyautogui.click(x=0.1632, y=0.2711)", "last_step_correct": true, "last_step_redundant": false, "reflection": "The action has successfully selected cell C2..." } } ] } ## AgentNet 训练数据结构 ### 数据组成部分 **原始标注数据:** - `instruction`:人工标注的任务描述 **由摘要模块合成的数据:** - `natural_language_task`:更自然的任务描述 - `actual_task`:详细的任务规格说明 - `task_completed`、`alignment_score`、`efficiency_score`、`task_difficulty`:任务质量评估指标 **轨迹步骤(`traj`):** 每个步骤包含以下训练组件: - `observation`:生成的视觉场景描述 - `thought`:推理与规划过程 - `action`:自然语言形式的动作描述 - `code`:可执行的PyAutoGUI/功能代码 **质量控制指标:** - `last_step_correct`:当前步骤是否正确 - `last_step_redundant`:当前步骤是否冗余 - `reflection`:由反思模块生成的错误分析内容 ### 训练消息格式 在训练过程中,数据会被转换为对话格式: python # 系统提示词 {"role": "system", "content": "You are a GUI agent..."} # 多图像历史(过往步骤) {"role": "assistant", "content": "# Step 1 ## Action: Open Excel application "} {"role": "user", "image": "screenshot1.png"} {"role": "assistant", "content": "# Step 2 ## Action: Click on File menu "} {"role": "user", "image": "screenshot2.png"} {"role": "assistant", "content": "# Step 3 ## Action: Select data range "} # 当前步骤指令 {"role": "user", "image": "current_screenshot.png"} {"role": "user", "content": "# Task Instruction: sort the table in ascending order... Please generate the next move..."} # 目标响应(L2思维链示例。仅对该部分应用损失函数。) {"role": "assistant", "content": "# Step 4 ## Thought: I need to select the data range first... ## Action: Click on cell C2 to select the number column ## Code: python pyautogui.click(x=0.1632, y=0.2711) "} 该训练支持不同层级的思维链(L1:动作+代码;L2:思考+动作+代码;L3:观察+思考+动作+代码)及动作历史。 详见附录G,获取包含完整系统提示词与多轮对话的训练数据详细示例。 ## 科研与商业使用 OpenCUA(包含模型、AgentNet数据集、工具及代码)遵循**MIT开源协议**(详见`LICENSE`文件),可用于科研、教育及商业用途。 ### 引用与致谢 若您在任何报告、技术文档、出版物、学位论文、演示文稿、博客文章、文档或其他公开分享的材料中使用**OpenCUA模型**和/或**AgentNet数据集**,请在正文中标注明确的致谢,并引用OpenCUA相关论文。 ### 禁止用途 - 本数据集不得用于任何违反任何司法管辖区适用法律法规的目的或活动 - 严禁用于非法、不道德或有害活动 - **严格禁止任何侵犯知识产权的未经授权复制、分发或使用行为** - 用户必须尊重数据集中涉及的任何数据主体的隐私权与保密权 ### 免责声明 - 作者、贡献者及版权持有者不对任何非法、不道德或有害使用数据集的行为负责,亦不对由此类使用导致的任何直接或间接损失负责 - 除非获得单独的书面许可,否则使用“AgentNet”名称、标识或商标并不意味着任何认可或附属关系 - 用户需自行确保其使用行为符合适用法律法规 - **用户需自行承担验证其预期使用未侵犯任何第三方权利或违反任何适用法律的全部责任** ## 引用 若您在研究中使用OpenCUA,请引用以下文献: bibtex @misc{wang2025opencuaopenfoundationscomputeruse, title={OpenCUA: Open Foundations for Computer-Use Agents}, author={Xinyuan Wang and Bowen Wang and Dunjie Lu and Junlin Yang and Tianbao Xie and Junli Wang and Jiaqi Deng and Xiaole Guo and Yiheng Xu and Chen Henry Wu and Zhennan Shen and Zhuokai Li and Ryan Li and Xiaochuan Li and Junda Chen and Boyuan Zheng and Peihang Li and Fangyu Lei and Ruisheng Cao and Yeqiao Fu and Dongchan Shin and Martin Shin and Jiarui Hu and Yuyan Wang and Jixuan Chen and Yuxiao Ye and Danyang Zhang and Dikang Du and Hao Hu and Huarong Chen and Zaida Zhou and Haotian Yao and Ziwei Chen and Qizheng Gu and Yipu Wang and Heng Wang and Diyi Yang and Victor Zhong and Flood Sung and Y. Charles and Zhilin Yang and Tao Yu}, year={2025}, eprint={2508.09123}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2508.09123}, }
提供机构:
maas
创建时间:
2025-08-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作