five

WebWalkerQA

收藏
魔搭社区2026-01-09 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/WebWalkerQA
下载链接
链接失效反馈
官方服务:
资源简介:
📑 The paper of WebWalkerQA is available at [arXiv](https://arxiv.org/pdf/2501.07572). 📊 The dataset resource is a collection of **680** questions and answers from the WebWebWalker dataset. 🙋 The dataset is in the form of a JSON file. The keys in the JSON include: Question, Answer, Root_Url, and Info. The Info field contains more detailed information, including Hop, Domain, Language, Difficulty_Level, Source Website, and Golden_Path. ``` { "Question": "When is the paper submission deadline for the ACL 2025 Industry Track, and what is the venue address for the conference?", "Answer": "The paper submission deadline for the ACL 2025 Industry Track is March 21, 2025. The conference will be held in Brune-Kreisky-Platz 1.", "Root_Url": "https://2025.aclweb.org/", "Info":{ "Hop": "multi-source", "Domain": "Conference", "Language": "English", "Difficulty_Level": "Medium", "Source_Website": ["https://2025.aclweb.org/calls/industry_track/","https://2025.aclweb.org/venue/"], "Golden_Path": ["root->call>student_research_workshop", "root->venue"] } } ``` 🏋️ We also release a collection of **15k** silver dataset, which although not yet carefully human-verified, can serve as supplementary \textbf{training data} to enhance agent performance. 🙋 If you have any questions, please feel free to contact us via the [Github issue](https://github.com/Alibaba-NLP/WebWalker/issue). ⚙️ Due to the web changes quickly, the dataset may contain outdated information, such as golden path or source website. We encourage you to contribute to the dataset by submitting a pull request to the WebWalkerQA or contacting us. 💡 If you find this dataset useful, please consider citing our paper: ```bigquery @article{wu2025webwalker, title={Webwalker: Benchmarking llms in web traversal}, author={Wu, Jialong and Yin, Wenbiao and Jiang, Yong and Wang, Zhenglin and Xi, Zekun and Fang, Runnan and Zhang, Linhai and He, Yulan and Zhou, Deyu and Xie, Pengjun and others}, journal={arXiv preprint arXiv:2501.07572}, year={2025} } ```

📑 WebWalkerQA 的相关研究论文可于 [arXiv](https://arxiv.org/pdf/2501.07572) 下载查阅。 📊 本数据集资源源自 WebWebWalker 数据集,包含共计680条问答对。 🙋 该数据集以JSON文件格式存储,其内置键名包括:Question(问题)、Answer(回答)、Root_Url(根网址)与Info(详细信息)。其中Info字段涵盖更为丰富的细节信息,具体包含Hop(跳转类型)、Domain(领域)、Language(语言)、Difficulty_Level(难度等级)、Source Website(来源网站)以及Golden_Path(金标路径)。 示例JSON结构如下: { "Question": "ACL 2025产业赛道的论文提交截止日期为何时,本次会议的举办地址是什么?", "Answer": "ACL 2025产业赛道的论文提交截止日期为2025年3月21日,本次会议将在Brune-Kreisky-Platz 1举办。", "Root_Url": "https://2025.aclweb.org/", "Info":{ "Hop": "多源", "Domain": "会议", "Language": "英语", "Difficulty_Level": "中等", "Source_Website": ["https://2025.aclweb.org/calls/industry_track/","https://2025.aclweb.org/venue/"], "Golden_Path": ["根节点->调用>学生研究研讨会", "根节点->会场"] } } 🏋️ 我们同时发布了包含15k条银标数据集的集合,尽管该部分数据尚未经过细致的人工核验,但可作为补充训练数据以提升AI智能体(AI Agent)的性能表现。 🙋 若您有任何疑问,欢迎通过 [Github issue](https://github.com/Alibaba-NLP/WebWalker/issue) 提交议题与我们取得联系。 ⚙️ 由于网页内容更新较快,本数据集可能存在过时信息,例如金标路径或来源网站。我们诚挚邀请您通过向WebWalkerQA提交拉取请求(pull request)或直接联系我们的方式参与数据集的迭代完善。 💡 若本数据集对您的研究有所助益,请考虑引用我们的论文《WebWalker:面向网页遍历的大语言模型基准测试》,引用格式如下: bigquery @article{wu2025webwalker, title={Webwalker: Benchmarking llms in web traversal}, author={Wu, Jialong and Yin, Wenbiao and Jiang, Yong and Wang, Zhenglin and Xi, Zekun and Fang, Runnan and Zhang, Linhai and He, Yulan and Zhou, Deyu and Xie, Pengjun and others}, journal={arXiv preprint arXiv:2501.07572}, year={2025} }
提供机构:
maas
创建时间:
2025-09-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作