omniact

Name: omniact
Creator: maas
Published: 2025-12-05 16:50:33
License: 暂无描述

魔搭社区2025-12-05 更新2025-09-20 收录

下载链接：

https://modelscope.cn/datasets/Writer-Org/omniact

下载链接

链接失效反馈

官方服务：

资源简介：

<img src="intro.png" width="700" title="OmniACT"> Dataset for [OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web](https://arxiv.org/abs/2402.17553) Splits: | split_name | count | |------------|-------| | train | 6788 | | test | 2020 | | val | 991 | Example datapoint: ```json "2849": { "task": "data/tasks/desktop/ibooks/task_1.30.txt", "image": "data/data/desktop/ibooks/screen_1.png", "box": "data/metadata/desktop/boxes/ibooks/screen_1.json" }, ``` where: - `task` - contains natural language description ("Task") along with the corresponding PyAutoGUI code ("Output Script"): ```text Task: Navigate to see the upcoming titles Output Script: pyautogui.moveTo(1881.5,1116.0) ``` - `image` - screen image where the action is performed - `box` - This is the metadata used during evaluation. The json format file contains labels for the interactable elements on the screen and their corresponding bounding boxes. <i>They shouldn't be used while inferencing on the test set.</i> <img src="screen_1.png" width="700" title="example screen image"> To cite OmniACT, please use: ``` @misc{kapoor2024omniact, title={OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web}, author={Raghav Kapoor and Yash Parag Butala and Melisa Russak and Jing Yu Koh and Kiran Kamble and Waseem Alshikh and Ruslan Salakhutdinov}, year={2024}, eprint={2402.17553}, archivePrefix={arXiv}, primaryClass={cs.AI} } ```

<img src="intro.png" width="700" title="OmniACT"> # OmniACT数据集：面向桌面与Web场景的多模态通用自主AI智能体（AI Agent）数据集与基准测试集数据集详情可参考论文：[OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web](https://arxiv.org/abs/2402.17553) ## 数据集划分 | 拆分名称 | 样本数量 | |----------|----------| | 训练集 | 6788 | | 测试集 | 2020 | | 验证集 | 991 | ## 示例数据点 json "2849": { "task": "data/tasks/desktop/ibooks/task_1.30.txt", "image": "data/data/desktop/ibooks/screen_1.png", "box": "data/metadata/desktop/boxes/ibooks/screen_1.json" }, 其中： - `task` 字段包含自然语言任务描述（即"任务"）与对应的PyAutoGUI代码（即"输出脚本"）： text Task: Navigate to see the upcoming titles Output Script: pyautogui.moveTo(1881.5,1116.0) - `image` 字段为执行操作时的屏幕图像。 - `box` 字段为评估阶段使用的元数据。该JSON格式文件包含屏幕上可交互元素的标注及其对应的边界框。**在测试集上进行推理时，不得使用该字段数据。** <img src="screen_1.png" width="700" title="示例屏幕图像"> ## 引用方式如需引用OmniACT数据集，请使用以下BibTeX条目： @misc{kapoor2024omniact, title={OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web}, author={Raghav Kapoor and Yash Parag Butala and Melisa Russak and Jing Yu Koh and Kiran Kamble and Waseem Alshikh and Ruslan Salakhutdinov}, year={2024}, eprint={2402.17553}, archivePrefix={arXiv}, primaryClass={cs.AI} }

提供机构：

maas

创建时间：

2025-09-15

搜集汇总

数据集介绍