five

2077AIDataFoundation/VeriWeb

收藏
Hugging Face2026-01-21 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/2077AIDataFoundation/VeriWeb
下载链接
链接失效反馈
官方服务:
资源简介:
VeriWeb是一个新颖的可验证长链网页基准测试,旨在促进在现实网页环境中评估和开发网页代理。与现有主要关注单事实检索并依赖结果验证的努力不同,VeriWeb强调长链复杂性和子任务级可验证性,以更好地反映现实的知识密集型场景。数据集包含302个现实世界的信息寻求任务,分布在5个真实领域,每个任务都被分解为多个相互依赖的子任务。此外,数据集还包括人类专家注释的任务指令、子任务分解和答案注释。数据集的结构包括任务注释和任务执行视频记录。

VeriWeb is a novel verifiable long-chain web benchmark designed to facilitate the evaluation and development of web agents within realistic web environments. Unlike existing efforts that mainly focus on single-fact retrieval and rely on outcome-only verification, VeriWeb emphasizes long-chain complexity and subtask-level verifiability to better reflect realistic knowledge-intensive scenarios. The dataset includes 302 realistic information-seeking tasks across 5 real-world domains, with each task decomposed into multiple interdependent subtasks. Additionally, the dataset features human-expert annotated task instructions, subtask decompositions, and answer annotations. The dataset structure includes task annotations and video recordings of task execution.
提供机构:
2077AIDataFoundation
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作