five

omniact

收藏
魔搭社区2025-12-05 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/Writer-Org/omniact
下载链接
链接失效反馈
官方服务:
资源简介:
<img src="intro.png" width="700" title="OmniACT"> Dataset for [OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web](https://arxiv.org/abs/2402.17553) Splits: | split_name | count | |------------|-------| | train | 6788 | | test | 2020 | | val | 991 | Example datapoint: ```json "2849": { "task": "data/tasks/desktop/ibooks/task_1.30.txt", "image": "data/data/desktop/ibooks/screen_1.png", "box": "data/metadata/desktop/boxes/ibooks/screen_1.json" }, ``` where: - `task` - contains natural language description ("Task") along with the corresponding PyAutoGUI code ("Output Script"): ```text Task: Navigate to see the upcoming titles Output Script: pyautogui.moveTo(1881.5,1116.0) ``` - `image` - screen image where the action is performed - `box` - This is the metadata used during evaluation. The json format file contains labels for the interactable elements on the screen and their corresponding bounding boxes. <i>They shouldn't be used while inferencing on the test set.</i> <img src="screen_1.png" width="700" title="example screen image"> To cite OmniACT, please use: ``` @misc{kapoor2024omniact, title={OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web}, author={Raghav Kapoor and Yash Parag Butala and Melisa Russak and Jing Yu Koh and Kiran Kamble and Waseem Alshikh and Ruslan Salakhutdinov}, year={2024}, eprint={2402.17553}, archivePrefix={arXiv}, primaryClass={cs.AI} } ```

<img src="intro.png" width="700" title="OmniACT"> # OmniACT数据集:面向桌面与Web场景的多模态通用自主AI智能体(AI Agent)数据集与基准测试集 数据集详情可参考论文:[OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web](https://arxiv.org/abs/2402.17553) ## 数据集划分 | 拆分名称 | 样本数量 | |----------|----------| | 训练集 | 6788 | | 测试集 | 2020 | | 验证集 | 991 | ## 示例数据点 json "2849": { "task": "data/tasks/desktop/ibooks/task_1.30.txt", "image": "data/data/desktop/ibooks/screen_1.png", "box": "data/metadata/desktop/boxes/ibooks/screen_1.json" }, 其中: - `task` 字段包含自然语言任务描述(即"任务")与对应的PyAutoGUI代码(即"输出脚本"): text Task: Navigate to see the upcoming titles Output Script: pyautogui.moveTo(1881.5,1116.0) - `image` 字段为执行操作时的屏幕图像。 - `box` 字段为评估阶段使用的元数据。该JSON格式文件包含屏幕上可交互元素的标注及其对应的边界框。**在测试集上进行推理时,不得使用该字段数据。** <img src="screen_1.png" width="700" title="示例屏幕图像"> ## 引用方式 如需引用OmniACT数据集,请使用以下BibTeX条目: @misc{kapoor2024omniact, title={OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web}, author={Raghav Kapoor and Yash Parag Butala and Melisa Russak and Jing Yu Koh and Kiran Kamble and Waseem Alshikh and Ruslan Salakhutdinov}, year={2024}, eprint={2402.17553}, archivePrefix={arXiv}, primaryClass={cs.AI} }
提供机构:
maas
创建时间:
2025-09-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作