microsoft/LiveDRBench
收藏Hugging Face2025-08-08 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/microsoft/LiveDRBench
下载链接
链接失效反馈官方服务:
资源简介:
LiveDRBench是一个用于评估深度研究(Deep Research, DR)系统性能的基准数据集。它包含100个科学主题和公共兴趣事件的挑战性DR任务,数据收集于2025年5月至6月。数据集分为多个类别,包括SciFacts-Geo、SciFacts-Materials、新数据集识别、新数据集识别与提取、新数据集同行检索、先前艺术搜索、实体和飞行事故。每个任务包括一个简短的任务描述和预期输出格式,以及包含应发现的声明和引用的地面真实JSON。数据集通过Hugging Face库加载,并提供了一个评估脚本来计算使用信息检索度量的DR系统的性能。
LiveDRBench is a benchmark dataset for evaluating the performance of Deep Research (DR) systems. It consists of 100 challenging DR tasks over scientific topics and public interest events, collected between May and June 2025. The dataset is categorized into several types, including SciFacts-Geo, SciFacts-Materials, NovelDatasets identification, NovelDatasets identification and extraction, NovelDatasets peer retrieval, PriorArt search, Entities, and Flight incidents. Each task includes a brief description of the task and the expected output format, along with a ground-truth JSON containing the claims and references that should be uncovered. The dataset can be loaded via the Hugging Face library, and an evaluation script is provided to compute the performance of DR systems using information-retrieval metrics.
提供机构:
microsoft



