Tevatron/browsecomp-plus-corpus
收藏Hugging Face2025-08-23 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/Tevatron/browsecomp-plus-corpus
下载链接
链接失效反馈官方服务:
资源简介:
BrowseComp-Plus是一个用于深度研究系统的新基准,它隔离了检索器和LLM代理的效果,以便对深度研究代理进行公平、透明的比较。该基准从OpenAI的BrowseComp获取具有挑战性的、需要推理密集型查询,但是它不是在实时网络上搜索,而是评估一个固定的、经过精心挑选的约10万篇网络文档。这些文档包括足够回答查询的人类验证证据文档和挖掘的困难负样本,以保持任务的挑战性。
BrowseComp-Plus is a new benchmark for Deep-Research systems, isolating the effect of the retriever and the LLM agent to enable fair, transparent comparisons of Deep-Research agents. The benchmark sources challenging, reasoning-intensive queries from OpenAIs BrowseComp, but instead of searching the live web, it evaluates against a fixed, curated corpus of approximately 100,000 web documents. This corpus includes both human-verified evidence documents sufficient to answer the queries and mined hard negatives to keep the task challenging.
提供机构:
Tevatron



