Mind2Web-2
收藏魔搭社区2025-12-05 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/osunlp/Mind2Web-2
下载链接
链接失效反馈官方服务:
资源简介:
# Mind2Web 2
Mind2Web 2 is an evaluation framework for agentic search capabilities, featuring Agent-as-a-Judge methodology for comprehensive assessment of web automation agents.
<div align="center">
<img src="https://github.com/OSU-NLP-Group/Mind2Web-2/blob/main/assets/mind2web2_overview.jpg?raw=true" alt="Mind2Web 2 Overview" width="800"/>
<p><em>Mind2Web 2 features realistic and diverse long-horizon web search tasks and a novel Agent-as-a-Judge framework to evaluate complex, time-varying, and citation-backed answers.</em></p>
</div>
## 🔗 Links
- [🏠 Homepage](https://osu-nlp-group.github.io/Mind2Web-2)
- [🏆 Leaderboard](https://osu-nlp-group.github.io/Mind2Web-2/#leaderboard)
- [📖 Paper](https://arxiv.org/abs/2506.21506)
- [💻 Code](https://github.com/OSU-NLP-Group/Mind2Web-2)
## 🔄 Changelog
- **Oct 23, 2025:**
- Updated several tasks to use dynamic relative time ranges instead of hardcoded time periods.
- All evaluation scripts are released for both public dev set and test set.
- Jun 27, 2025: Initial Release.
For details and old versions, please refer to [changelog.md](changelog.md).
## 📝 Citation Information
If you find this work useful, please consider citing our paper:
```
@inproceedings{
gou2025mindweb,
title={Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge},
author={Boyu Gou and Zanming Huang and Yuting Ning and Yu Gu and Michael Lin and Botao Yu and Andrei Kopanev and Weijian Qi and Yiheng Shu and Jiaman Wu and Chan Hee Song and Bernal Jimenez Gutierrez and Yifei Li and Zeyi Liao and Hanane Nour Moussa and TIANSHU ZHANG and Jian Xie and Tianci Xue and Shijie Chen and Boyuan Zheng and Kai Zhang and Zhaowei Cai and Viktor Rozgic and Morteza Ziyadi and Huan Sun and Yu Su},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2025},
url={https://openreview.net/forum?id=AUaW6DS9si}
}
```
# Mind2Web 2
Mind2Web 2是一款面向智能体搜索能力的评估框架,其采用Agent-as-a-Judge(智能体作为评判者)方法,可对网页自动化智能体开展全面评估。
<div align="center">
<img src="https://github.com/OSU-NLP-Group/Mind2Web-2/blob/main/assets/mind2web2_overview.jpg?raw=true" alt="Mind2Web 2 Overview" width="800"/>
<p><em>Mind2Web 2 包含真实且多样化的长时序网页搜索任务,以及全新的Agent-as-a-Judge(智能体作为评判者)框架,用于评估复杂、动态变化且带有引用支撑的答案。</em></p>
</div>
## 🔗 相关链接
- [🏠 项目主页](https://osu-nlp-group.github.io/Mind2Web-2)
- [🏆 排行榜](https://osu-nlp-group.github.io/Mind2Web-2/#leaderboard)
- [📖 论文](https://arxiv.org/abs/2506.21506)
- [💻 代码仓库](https://github.com/OSU-NLP-Group/Mind2Web-2)
## 🔄 更新日志
- **2025年10月23日:**
- 优化多项任务,改用动态相对时间范围替代硬编码的固定时段。
- 面向公开开发集与测试集的全部评估脚本已正式发布。
- 2025年6月27日:首次正式发布。
详细信息与历史版本请查阅 [changelog.md](changelog.md)。
## 📝 引用声明
若您的工作使用了本项目内容,请引用我们的论文:
@inproceedings{
gou2025mindweb,
title={Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge},
author={Boyu Gou and Zanming Huang and Yuting Ning and Yu Gu and Michael Lin and Botao Yu and Andrei Kopanev and Weijian Qi and Yiheng Shu and Jiaman Wu and Chan Hee Song and Bernal Jimenez Gutierrez and Yifei Li and Zeyi Liao and Hanane Nour Moussa and TIANSHU ZHANG and Jian Xie and Tianci Xue and Shijie Chen and Boyuan Zheng and Kai Zhang and Zhaowei Cai and Viktor Rozgic and Morteza Ziyadi and Huan Sun and Yu Su},
booktitle={第三十九届神经信息处理系统年度会议数据集与基准赛道},
year={2025},
url={https://openreview.net/forum?id=AUaW6DS9si}
}
提供机构:
maas
创建时间:
2025-07-04



