discoverybench
收藏魔搭社区2025-07-24 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/discoverybench
下载链接
链接失效反馈官方服务:
资源简介:
Data-driven Discovery Benchmark from the paper:
"DiscoveryBench: Towards Data-Driven Discovery with Large Language Models"
# 🔭 Overview
DiscoveryBench is designed to systematically assess current model capabilities in data-driven discovery tasks and provide a useful resource for improving them. Each DiscoveryBench task consists of a goal and dataset(s). Solving the task requires both statistical analysis and semantic reasoning. A faceted evaluation allows open-ended final answers to be rigorously evaluated.
# 🌟 Dataset Structure
This repo is structured as follows:
discoverybench: Contains both real and synthetic benchmark folders.
Each benchmark has train and test partitions.
Each folder in this partition has common query-dataset-files (usually csv) and multiple metadata_*.json files.
Each metadata_*.json file contains one or more queries that all can be answered by the gold hypothesis present in answer_keys (explained below).
answer_key: Gold hypothesis for real and synthetic discovery tasks
Each record in the answer key is indexed by the dataset-folder name, metadata_id and qid.
# 🚀 Agents and Evaluation
More resources regarding the dataset, agents and evaluation protocols can be found in [here](https://github.com/allenai/discoverybench/tree/main).
The `discovery_agent.py` file includes code for discovery agents. These agents are designed to perform data-driven discovery tasks by leveraging different large language models.
The 'discovery_eval.py' folder contains the necessary scripts and tools to evaluate the performance of these agents.
Detailed instructions on how to run and evaluate the agents are provided in the README, ensuring researchers and developers can efficiently utilize and
assess DiscoveryBench for their data-driven discovery projects.
# ✍️ Citation
If you find our work/dataset helpful, please use the following citations.
```
@article{majumder2024discoverybench,
author = "Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark",
title = "DiscoveryBench: Towards Data-Driven Discovery with Large Language Models",
journal = "arXiv",
year = "2024",
}
```
数据驱动发现基准数据集
本数据集源自论文《DiscoveryBench: Towards Data-Driven Discovery with Large Language Models》
# 🔭 概览
DiscoveryBench 旨在系统性评估当前模型在数据驱动发现任务中的能力,并为优化此类能力提供实用研究资源。每个DiscoveryBench任务均包含一个目标与一个或多个数据集。完成该任务需同时开展统计分析与语义推理。多维度评估机制可对开放式最终答案进行严谨校验。
# 🌟 数据集结构
本代码仓库结构如下:
discoverybench:包含真实基准数据集与合成基准数据集两个子文件夹。每个基准数据集均设有训练集与测试集划分。该划分下的每个子文件夹均包含通用查询数据集文件(通常为CSV格式)与多个metadata_*.json文件。每个metadata_*.json文件包含一个或多个查询,所有查询均可通过answer_keys(详见下文说明)中的黄金假设得到解答。
answer_key:对应真实与合成发现任务的黄金标准答案假设。该标准答案集中的每条记录均以数据集文件夹名称、metadata_id以及qid作为索引。
# 🚀 智能体与评估
有关本数据集、智能体与评估协议的更多资源,请参阅[此处](https://github.com/allenai/discoverybench/tree/main)。
`discovery_agent.py`文件包含了发现智能体的代码,这些智能体旨在通过调用不同的大语言模型(Large Language Model)完成数据驱动发现任务。
`discovery_eval.py`文件夹包含了用于评估这些智能体性能的必要脚本与工具。
README文件中提供了运行与评估智能体的详细指南,可确保研究人员与开发者能够高效利用并评估DiscoveryBench,以支撑其数据驱动发现相关项目。
# ✍️ 引用
若您认为本工作/数据集对您有所帮助,请引用以下文献:
@article{majumder2024discoverybench,
author = "Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark",
title = "DiscoveryBench: Towards Data-Driven Discovery with Large Language Models",
journal = "arXiv",
year = "2024",
}
提供机构:
maas
创建时间:
2025-05-27



