omniact
收藏魔搭社区2025-12-05 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/Writer-Org/omniact
下载链接
链接失效反馈官方服务:
资源简介:
<img src="intro.png" width="700" title="OmniACT">
Dataset for [OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web](https://arxiv.org/abs/2402.17553)
Splits:
| split_name | count |
|------------|-------|
| train | 6788 |
| test | 2020 |
| val | 991 |
Example datapoint:
```json
"2849": {
"task": "data/tasks/desktop/ibooks/task_1.30.txt",
"image": "data/data/desktop/ibooks/screen_1.png",
"box": "data/metadata/desktop/boxes/ibooks/screen_1.json"
},
```
where:
- `task` - contains natural language description ("Task") along with the corresponding PyAutoGUI code ("Output Script"):
```text
Task: Navigate to see the upcoming titles
Output Script:
pyautogui.moveTo(1881.5,1116.0)
```
- `image` - screen image where the action is performed
- `box` - This is the metadata used during evaluation. The json format file contains labels for the interactable elements on the screen and their corresponding bounding boxes. <i>They shouldn't be used while inferencing on the test set.</i>
<img src="screen_1.png" width="700" title="example screen image">
To cite OmniACT, please use:
```
@misc{kapoor2024omniact,
title={OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web},
author={Raghav Kapoor and Yash Parag Butala and Melisa Russak and Jing Yu Koh and Kiran Kamble and Waseem Alshikh and Ruslan Salakhutdinov},
year={2024},
eprint={2402.17553},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
```
<img src="intro.png" width="700" title="OmniACT">
# OmniACT数据集:面向桌面与Web场景的多模态通用自主AI智能体(AI Agent)数据集与基准测试集
数据集详情可参考论文:[OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web](https://arxiv.org/abs/2402.17553)
## 数据集划分
| 拆分名称 | 样本数量 |
|----------|----------|
| 训练集 | 6788 |
| 测试集 | 2020 |
| 验证集 | 991 |
## 示例数据点
json
"2849": {
"task": "data/tasks/desktop/ibooks/task_1.30.txt",
"image": "data/data/desktop/ibooks/screen_1.png",
"box": "data/metadata/desktop/boxes/ibooks/screen_1.json"
},
其中:
- `task` 字段包含自然语言任务描述(即"任务")与对应的PyAutoGUI代码(即"输出脚本"):
text
Task: Navigate to see the upcoming titles
Output Script:
pyautogui.moveTo(1881.5,1116.0)
- `image` 字段为执行操作时的屏幕图像。
- `box` 字段为评估阶段使用的元数据。该JSON格式文件包含屏幕上可交互元素的标注及其对应的边界框。**在测试集上进行推理时,不得使用该字段数据。**
<img src="screen_1.png" width="700" title="示例屏幕图像">
## 引用方式
如需引用OmniACT数据集,请使用以下BibTeX条目:
@misc{kapoor2024omniact,
title={OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web},
author={Raghav Kapoor and Yash Parag Butala and Melisa Russak and Jing Yu Koh and Kiran Kamble and Waseem Alshikh and Ruslan Salakhutdinov},
year={2024},
eprint={2402.17553},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
提供机构:
maas
创建时间:
2025-09-15



