Online-Mind2Web
收藏魔搭社区2026-01-08 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/osunlp/Online-Mind2Web
下载链接
链接失效反馈官方服务:
资源简介:
<div align="center">
<a href="https://tiancixue.notion.site/An-Illusion-of-Progress-Assessing-the-Current-State-of-Web-Agents-1ac6cd2b9aac80719cd6f68374aaf4b4?pvs=4">Blog</a> |
<a href="https://arxiv.org/abs/2504.01382">Paper</a> |
<a href="https://github.com/OSU-NLP-Group/Online-Mind2Web">Code</a> |
<a href="https://huggingface.co/spaces/osunlp/Online_Mind2Web_Leaderboard">Leaderboard</a>
</div>
## Online-Mind2Web
Online-Mind2Web is the online version of [Mind2Web](https://osu-nlp-group.github.io/Mind2Web/), a more diverse and user-centric dataset includes 300 high-quality tasks from 136 popular websites across various domains. The dataset covers a diverse set of user tasks, such as clothing, food, housing, and transportation, to evaluate web agents' performance in a real-world online environment.
## News
- [11/03/2025] We’ve updated 36 tasks that are no longer valid or involve websites with CAPTCHA verification. Please check out the updated tasks! (We added a date suffix to the updated task IDs to distinguish them from the previous versions.)
### Data Fields
- "task_id" (str): Unique id for each task.
- "website" (str): Website url.
- "task_description" (str): Task description.
- "reference_length" (int): Number of steps required for a human annotator to complete the task.
### Update Tasks
We will regularly update Online-Mind2Web by replacing outdated or invalid tasks (e.g., due to website changes) to maintain its value as a rigorous benchmark for web agents. If you find any tasks are outdated, please reach out to us, and we will update them.
To ensure fair comparisons, we will aim to keep the updated tasks on the same websites as before and with a similar reference length. Additionally, once agent performance saturates on Online-Mind2Web, we will also revise simple tasks to preserve its long-term value.
#### 2026/01/02
<details>
<summary>🧩 Updated Task IDs</summary>
['547f5729c59d5d12a457a3ebb74c31c6']
</details>
#### 2025/12/14
<details>
<summary>🧩 Updated Task IDs</summary>
['c698ff3fc0f6cbce39947c597ab5749b', '50d91eabde542906937ab4c5b6f8f23a']
</details>
#### 2025/12/11
<details>
<summary>🧩 Updated Task IDs</summary>
['b64f938af842f6a1b4489d0e49a785a7', '7e1047f4803237f319c004f7a7f6bccb', 'c94551d2b18f9ad0ab31b0bd98ca42e3', '47186fac8e7c7277af01144644eb4e0b', '78baf9dbe7c3532f7d7ef4cc22a7f065']
</details>
#### 2025/11/23
<details>
<summary>🧩 Updated Task IDs</summary>
['9829f3087ab1f9c8eba6b6dd2b831d25', '1bc154377120ec15b18dbabdba49c741']
</details>
#### 2025/11/03
**Update summary:**
Based on community feedback, we updated 36 tasks that were no longer valid or involved websites with CAPTCHA verification. The updated tasks were carefully designed to preserve similar difficulty and task types, ensuring fair comparison with prior results.
<details>
<summary>🧩 Updated Task IDs</summary>
['b7258ee05d75e6c50673a59914db412e', '824eb7bb0ef1ce40bfd49c12182d9428', '8f2611047de227a2ca8bda13f6e2e5fb', '62f1626ce249c31098854f8b38bdd6cf', '79f0bd7df6e685f30f20025cc6755c0a', '5e1b8254c123c80178cc28e0afdb14f0', '816851ff92ff0219acf4364dcc2c4692', 'e7301bb694871429bf2eb36c3a72186c', '3c1ffc3f494e423b3c434c79e35da8f3', '9f1cba613830ca1c6a58f9498c06e679', '9c97bab9c2abfb90a426cbe9addae8d0', '2fc51dd3febd447f0fdcdabca8d944ce', 'eb323dc584156d0eb3a2b90bb8c4b791', 'a0a18ca6a3529f3e97c771aadd42d3a0', 'e7f6cca9a8875f98fee3b711ead3a444', 'f2be37a9a60fbc25b6b11cf622d17352', '2d5a7f95f951a26838289dfd629ae850', '502e864440283214e0180645015f568b', '3adeea7627f4343069f38adae40f73d0', '8f80e64e44e1fada018997b2fe869683', '0a0fa834ce41b5297c6474293383759d', '64345c365f544375357c7b67917f08a0', '33bd2cdcea4fcc42a09a8a1e4e5841c6', '3dca7cbe7d086619d837ff9f5312cebc', '11857213ca01510f12813740afd59918', 'd730f4ff450da1bd60a836163736ef6a', 'fe33894188d20d7469f37a9fd855e7ff', 'e43cbc8a0bf9e999884928d11006f894', 'c577a14301a725e09ccd269a3e0b271e', '2c8ef01a92c71ba9ef2e59bb17eea2b3', '636b07af4dd97c1793733db1fd1b90b8', 'd8e2a81fa621ce4737e5ea85671b630e', '199be0b54a436daee74247971fc684ee', 'd1807551297ac60ecaaabbd2a2ed301a', 'dd44c665cec1e9c929a4c5f074e7844a', '1ab384fb3a791edfb410213cc6b82151']
</details>
**2025/04/05:** Updated task IDs: ["c03ee2be3d73556ab789c0ad1cbd3451", "c181f903ec1107b850032c17cad88393", "2c8ef01a92c71ba9ef2e59bb17eea2b3", "d8e2a81fa621ce4737e5ea85671b630e", "63d6866fc000fcb1f153e07604bd1395", "199be0b54a436daee74247971fc684ee"]
### Disclaimer
This dataset was collected and released solely for research purposes, with the goal of making the web more accessible via language technologies. The authors are strongly against any potential harmful use of the data or technology to any party.
### Citation Information
Note: Online-Mind2Web is derived from the original Mind2Web dataset. We kindly ask that you cite both the original and this work when using or referencing the data.
```
@article{xue2025illusionprogressassessingcurrent,
title={An Illusion of Progress? Assessing the Current State of Web Agents},
author={Tianci Xue and Weijian Qi and Tianneng Shi and Chan Hee Song and Boyu Gou and Dawn Song and Huan Sun and Yu Su},
year={2025},
eprint={2504.01382},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2504.01382},
}
@inproceedings{deng2023mind2web,
author = {Deng, Xiang and Gu, Yu and Zheng, Boyuan and Chen, Shijie and Stevens, Sam and Wang, Boshi and Sun, Huan and Su, Yu},
booktitle = {Advances in Neural Information Processing Systems},
editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
pages = {28091--28114},
publisher = {Curran Associates, Inc.},
title = {Mind2Web: Towards a Generalist Agent for the Web},
url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/5950bf290a1570ea401bf98882128160-Paper-Datasets_and_Benchmarks.pdf},
volume = {36},
year = {2023}
}
```
<div align="center">
<a href="https://tiancixue.notion.site/An-Illusion-of-Progress-Assessing-the-Current-State-of-Web-Agents-1ac6cd2b9aac80719cd6f68374aaf4b4?pvs=4">博客</a> |
<a href="https://arxiv.org/abs/2504.01382">论文</a> |
<a href="https://github.com/OSU-NLP-Group/Online-Mind2Web">代码</a> |
<a href="https://huggingface.co/spaces/osunlp/Online_Mind2Web_Leaderboard">排行榜</a>
</div>
## Online-Mind2Web
Online-Mind2Web是[Mind2Web](https://osu-nlp-group.github.io/Mind2Web/)的线上版本,这是一个更多样化、以用户为中心的数据集,包含来自136个热门垂直领域网站的300项高质量任务。该数据集覆盖了服饰、餐饮、住宿、交通等多样化的用户任务,用于评估Web智能体在真实线上环境中的表现。
## 动态
- [2025/11/03] 我们更新了36项已失效或涉及需验证码(CAPTCHA)验证的网站的任务。请查看更新后的任务!(我们为更新后的任务ID添加了日期后缀,以区分旧版本。)
### 数据字段
- "task_id"(字符串):每个任务的唯一标识符。
- "website"(字符串):任务所属网站的URL。
- "task_description"(字符串):任务描述。
- "reference_length"(整数):人类标注者完成该任务所需的步骤数。
### 任务更新
我们将定期更新Online-Mind2Web,替换过时或失效的任务(例如因网站变更导致的失效任务),以维持其作为Web智能体严谨评测基准的价值。若您发现任何任务已过时,请与我们联系,我们将及时更新。
为确保评测公平,我们会尽量保证更新后的任务仍在原网站平台,且参考步骤数与原任务相近。此外,当AI智能体在Online-Mind2Web上的性能趋于饱和时,我们也将修订简单任务,以保障该数据集的长期使用价值。
#### 2026/01/02
<details>
<summary>🧩 更新后的任务ID</summary>
['547f5729c59d5d12a457a3ebb74c31c6']
</details>
#### 2025/12/14
<details>
<summary>🧩 更新后的任务ID</summary>
['c698ff3fc0f6cbce39947c597ab5749b', '50d91eabde542906937ab4c5b6f8f23a']
</details>
#### 2025/12/11
<details>
<summary>🧩 更新后的任务ID</summary>
['b64f938af842f6a1b4489d0e49a785a7', '7e1047f4803237f319c004f7a7f6bccb', 'c94551d2b18f9ad0ab31b0bd98ca42e3', '47186fac8e7c7277af01144644eb4e0b', '78baf9dbe7c3532f7d7ef4cc22a7f065']
</details>
#### 2025/11/23
<details>
<summary>🧩 更新后的任务ID</summary>
['9829f3087ab1f9c8eba6b6dd2b831d25', '1bc154377120ec15b18dbabdba49c741']
</details>
#### 2025/11/03
**更新说明:**
根据社区反馈,我们更新了36项已失效或涉及需验证码验证的网站的任务。更新后的任务保留了相近的任务难度与任务类型,确保与此前的评测结果具备可比性。
<details>
<summary>🧩 更新后的任务ID</summary>
['b7258ee05d75e6c50673a59914db412e', '824eb7bb0ef1ce40bfd49c12182d9428', '8f2611047de227a2ca8bda13f6e2e5fb', '62f1626ce249c31098854f8b38bdd6cf', '79f0bd7df6e685f30f20025cc6755c0a', '5e1b8254c123c80178cc28e0afdb14f0', '816851ff92ff0219acf4364dcc2c4692', 'e7301bb694871429bf2eb36c3a72186c', '3c1ffc3f494e423b3c434c79e35da8f3', '9f1cba613830ca1c6a58f9498c06e679', '9c97bab9c2abfb90a426cbe9addae8d0', '2fc51dd3febd447f0fdcdabca8d944ce', 'eb323dc584156d0eb3a2b90bb8c4b791', 'a0a18ca6a3529f3e97c771aadd42d3a0', 'e7f6cca9a8875f98fee3b711ead3a444', 'f2be37a9a60fbc25b6b11cf622d17352', '2d5a7f95f951a26838289dfd629ae850', '502e864440283214e0180645015f568b', '3adeea7627f4343069f38adae40f73d0', '8f80e64e44e1fada018997b2fe869683', '0a0fa834ce41b5297c6474293383759d', '64345c365f544375357c7b67917f08a0', '33bd2cdcea4fcc42a09a8a1e4e5841c6', '3dca7cbe7d086619d837ff9f5312cebc', '11857213ca01510f12813740afd59918', 'd730f4ff450da1bd60a836163736ef6a', 'fe33894188d20d7469f37a9fd855e7ff', 'e43cbc8a0bf9e999884928d11006f894', 'c577a14301a725e09ccd269a3e0b271e', '2c8ef01a92c71ba9ef2e59bb17eea2b3', '636b07af4dd97c1793733db1fd1b90b8', 'd8e2a81fa621ce4737e5ea85671b630e', '199be0b54a436daee74247971fc684ee', 'd1807551297ac60ecaaabbd2a2ed301a', 'dd44c665cec1e9c929a4c5f074e7844a', '1ab384fb3a791edfb410213cc6b82151']
</details>
**2025/04/05:** 更新后的任务ID: ["c03ee2be3d73556ab789c0ad1cbd3451", "c181f903ec1107b850032c17cad88393", "2c8ef01a92c71ba9ef2e59bb17eea2b3", "d8e2a81fa621ce4737e5ea85671b630e", "63d6866fc000fcb1f153e07604bd1395", "199be0b54a436daee74247971fc684ee"]
### 免责声明
本数据集仅为研究目的收集并发布,旨在通过语言技术提升Web的可访问性。作者坚决反对任何可能对第三方造成危害的数据或技术使用行为。
### 引用说明
注意:Online-Mind2Web源自原始Mind2Web数据集。我们恳请您在使用或引用该数据集时,同时引用原始论文与本工作。
@article{xue2025illusionprogressassessingcurrent,
title={An Illusion of Progress? Assessing the Current State of Web Agents},
author={Tianci Xue and Weijian Qi and Tianneng Shi and Chan Hee Song and Boyu Gou and Dawn Song and Huan Sun and Yu Su},
year={2025},
eprint={2504.01382},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2504.01382},
}
@inproceedings{deng2023mind2web,
author = {Deng, Xiang and Gu, Yu and Zheng, Boyuan and Chen, Shijie and Stevens, Sam and Wang, Boshi and Sun, Huan and Su, Yu},
booktitle = {Advances in Neural Information Processing Systems},
editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
pages = {28091--28114},
publisher = {Curran Associates, Inc.},
title = {Mind2Web: Towards a Generalist Agent for the Web},
url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/5950bf290a1570ea401bf98882128160-Paper-Datasets_and_Benchmarks.pdf},
volume = {36},
year = {2023}
}
提供机构:
maas
创建时间:
2025-07-04



