BRIGHT

Name: BRIGHT
Creator: maas
Published: 2025-12-05 16:47:01
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-23 收录

下载链接：

https://modelscope.cn/datasets/xlangai/BRIGHT

下载链接

链接失效反馈

官方服务：

资源简介：

# BRIGHT benchmark BRIGHT is the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. The queries are collected from diverse domains (StackExchange, LeetCode, and math competitions), all sourced from realistic human data. Experiments show that existing retrieval models perform poorly on BRIGHT, where the highest score is only 22.1 measured by nDCG@10. BRIGHT provides a good testbed for future retrieval research in more realistic and challenging settings. More details are in the [paper](https://brightbenchmark.github.io/). ## Dataset Structure We unify all the datasets with consistent formats. They are organized into three subsets, examples, documents, and long_documents: * `examples`: * `query`: the query for retrieval * `reasoning`: the gold reasoning steps annotated by humans (they help people understand the relevance between queries and documents, but are not used in any experiment in the paper) * `id`: the index of the instance * `excluded_ids`: a list of the ids (string) to exclude during evaluation (only for `theoremqa`/`aops`/`leetcode`) * `gold_ids_long`: a list of the ids (string) of the ground truth documents, corresponding to the ids of the `long_documents` subset * `gold_ids`: a list of the ids (string) of the ground truth documents, corresponding to the indices of the `documents` subset * `documents`: * `id`: the index of the document * `content`: document content (short version split from the complete web page, blogs, etc., or a problem and solution pair) * `long_documents` (not applicable to `theoremqa`/`aops`/`leetcode`): * `id`: the index of the document * `content`: document content (long version corresponding to the complete web page, blogs, etc.) ## Dataset Statistics <img src="statistics.png" width="80%" alt="BRIGHT statistics"> ## Data Loading Each dataset can be easily loaded. For example, to load biology examples: ``` from datasets import load_dataset data = load_dataset('xlangai/BRIGHT', 'examples')['biology'] ``` ## Citation If you find our work helpful, please cite us: ```citation @misc{BRIGHT, title={BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, author={Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and Sun, Ruoxi and Yoon, Jinsung and Arik, Sercan O and Chen, Danqi and Yu, Tao}, url={https://arxiv.org/abs/2407.12883}, year={2024}, } ```

# BRIGHT基准数据集 BRIGHT是首个需要进行深度推理才能检索到相关文档的文本检索基准数据集。该基准的查询样本来自多个不同领域（StackExchange（堆栈交换）、LeetCode（力扣）以及数学竞赛），所有数据均源自真实的人类交互数据。实验结果表明，现有检索模型在BRIGHT上的表现极差，以nDCG@10（归一化折现累积增益）计算的最高得分仅为22.1。 BRIGHT为未来在更贴近现实且更具挑战性的场景下开展检索研究提供了优质的测试平台。更多细节可查阅[相关论文](https://brightbenchmark.github.io/)。 ## 数据集结构我们将所有数据集统一为标准格式，分为三个子集：`examples`（样本子集）、`documents`（短文档子集）与`long_documents`（长文档子集）： * `examples`（样本子集）： * `query`：检索查询文本 * `reasoning`：人工标注的标准推理步骤（用于帮助理解查询与文档间的相关性，但论文中的所有实验均未使用该字段） * `id`：样本实例的索引 * `excluded_ids`：评估阶段需排除的文档ID列表（字符串格式，仅适用于`theoremqa`/`aops`/`leetcode`数据集） * `gold_ids_long`：真实相关文档的ID列表（字符串格式，对应`long_documents`子集的文档ID） * `gold_ids`：真实相关文档的ID列表（字符串格式，对应`documents`子集的文档索引） * `documents`（短文档子集）： * `id`：文档的索引 * `content`：文档内容（从完整网页、博客等拆分出的短版本，或问题与解答对） * `long_documents`（长文档子集，`theoremqa`/`aops`/`leetcode`不适用该子集）： * `id`：文档的索引 * `content`：文档内容（对应完整网页、博客等的长版本内容） ## 数据集统计信息 <img src="statistics.png" width="80%" alt="BRIGHT数据集统计信息"> ## 数据加载各数据集均可便捷加载。以加载生物学领域样本为例： from datasets import load_dataset data = load_dataset('xlangai/BRIGHT', 'examples')['biology'] ## 引用若您的研究用到本数据集，请引用如下文献： citation @misc{BRIGHT, title={BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, author={Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and Sun, Ruoxi and Yoon, Jinsung and Arik, Sercan O and Chen, Danqi and Yu, Tao}, url={https://arxiv.org/abs/2407.12883}, year={2024}, }

提供机构：

maas

创建时间：

2025-08-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集