mteb/BRIGHT

Name: mteb/BRIGHT
Creator: mteb
Published: 2026-04-02 16:24:21
License: 暂无描述

Hugging Face2026-04-02 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/mteb/BRIGHT

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc-by-4.0 size_categories: - 1K<n<10K task_categories: - text-retrieval dataset_info: - config_name: Gemini-1.0_reason features: - name: query dtype: string - name: reasoning dtype: string - name: id dtype: string - name: excluded_ids sequence: string - name: gold_ids_long sequence: string - name: gold_ids sequence: string - name: gold_answer dtype: string splits: - name: biology num_bytes: 571840 num_examples: 103 - name: earth_science num_bytes: 569680 num_examples: 116 - name: economics num_bytes: 617171 num_examples: 103 - name: psychology num_bytes: 557169 num_examples: 101 - name: robotics num_bytes: 467620 num_examples: 101 - name: stackoverflow num_bytes: 652597 num_examples: 117 - name: sustainable_living num_bytes: 626085 num_examples: 108 - name: leetcode num_bytes: 1382908 num_examples: 142 - name: pony num_bytes: 327318 num_examples: 112 - name: aops num_bytes: 14182450 num_examples: 111 - name: theoremqa_questions num_bytes: 13144059 num_examples: 194 - name: theoremqa_theorems num_bytes: 393390 num_examples: 76 download_size: 5946613 dataset_size: 33492287 - config_name: claude-3-opus_reason features: - name: query dtype: string - name: reasoning dtype: string - name: id dtype: string - name: excluded_ids sequence: string - name: gold_ids_long sequence: string - name: gold_ids sequence: string - name: gold_answer dtype: string splits: - name: biology num_bytes: 557025 num_examples: 103 - name: earth_science num_bytes: 558266 num_examples: 116 - name: economics num_bytes: 574237 num_examples: 103 - name: psychology num_bytes: 516517 num_examples: 101 - name: robotics num_bytes: 437209 num_examples: 101 - name: stackoverflow num_bytes: 622316 num_examples: 117 - name: sustainable_living num_bytes: 591735 num_examples: 108 - name: leetcode num_bytes: 1407501 num_examples: 142 - name: pony num_bytes: 361169 num_examples: 112 - name: aops num_bytes: 14149870 num_examples: 111 - name: theoremqa_questions num_bytes: 13121703 num_examples: 194 - name: theoremqa_theorems num_bytes: 382228 num_examples: 76 download_size: 5777474 dataset_size: 33279776 - config_name: documents features: - name: id dtype: string - name: content dtype: string splits: - name: biology num_bytes: 21983744 num_examples: 57359 - name: earth_science num_bytes: 46952371 num_examples: 121249 - name: economics num_bytes: 22771374 num_examples: 50220 - name: psychology num_bytes: 23167414 num_examples: 52835 - name: robotics num_bytes: 20718385 num_examples: 61961 - name: stackoverflow num_bytes: 189733583 num_examples: 107081 - name: sustainable_living num_bytes: 24373723 num_examples: 60792 - name: pony num_bytes: 2365157 num_examples: 7894 - name: leetcode num_bytes: 456581333 num_examples: 413932 - name: aops num_bytes: 146564475 num_examples: 188002 - name: theoremqa_theorems num_bytes: 21124422 num_examples: 23839 - name: theoremqa_questions num_bytes: 146564475 num_examples: 188002 download_size: 465489179 dataset_size: 1122900456 - config_name: examples features: - name: query dtype: string - name: reasoning dtype: string - name: id dtype: string - name: excluded_ids sequence: string - name: gold_ids_long sequence: string - name: gold_ids sequence: string - name: gold_answer dtype: string splits: - name: biology num_bytes: 326427 num_examples: 103 - name: earth_science num_bytes: 280741 num_examples: 116 - name: economics num_bytes: 343172 num_examples: 103 - name: psychology num_bytes: 286062 num_examples: 101 - name: robotics num_bytes: 366862 num_examples: 101 - name: stackoverflow num_bytes: 470365 num_examples: 117 - name: sustainable_living num_bytes: 336562 num_examples: 108 - name: pony num_bytes: 135009 num_examples: 112 - name: leetcode num_bytes: 1212640 num_examples: 142 - name: aops num_bytes: 13981802 num_examples: 111 - name: theoremqa_theorems num_bytes: 257842 num_examples: 76 - name: theoremqa_questions num_bytes: 12810785 num_examples: 194 download_size: 4590189 dataset_size: 30808269 - config_name: gpt4_reason features: - name: query dtype: string - name: reasoning dtype: string - name: id dtype: string - name: excluded_ids sequence: string - name: gold_ids_long sequence: string - name: gold_ids sequence: string - name: gold_answer dtype: string splits: - name: biology num_bytes: 613511 num_examples: 103 - name: earth_science num_bytes: 618266 num_examples: 116 - name: economics num_bytes: 642234 num_examples: 103 - name: psychology num_bytes: 571504 num_examples: 101 - name: robotics num_bytes: 519720 num_examples: 101 - name: stackoverflow num_bytes: 704186 num_examples: 117 - name: sustainable_living num_bytes: 657382 num_examples: 108 - name: leetcode num_bytes: 1461063 num_examples: 142 - name: pony num_bytes: 423199 num_examples: 112 - name: aops num_bytes: 14332394 num_examples: 111 - name: theoremqa_questions num_bytes: 13262646 num_examples: 194 - name: theoremqa_theorems num_bytes: 442272 num_examples: 76 download_size: 6421256 dataset_size: 34248377 - config_name: grit_reason features: - name: query dtype: string - name: reasoning dtype: string - name: id dtype: string - name: excluded_ids sequence: string - name: gold_ids_long sequence: string - name: gold_ids sequence: string - name: gold_answer dtype: string splits: - name: biology num_bytes: 478151 num_examples: 103 - name: earth_science num_bytes: 443792 num_examples: 116 - name: economics num_bytes: 493163 num_examples: 103 - name: psychology num_bytes: 407907 num_examples: 101 - name: robotics num_bytes: 340895 num_examples: 101 - name: stackoverflow num_bytes: 540771 num_examples: 117 - name: sustainable_living num_bytes: 475118 num_examples: 108 - name: leetcode num_bytes: 1305306 num_examples: 142 - name: pony num_bytes: 258002 num_examples: 112 - name: aops num_bytes: 14170933 num_examples: 111 - name: theoremqa_questions num_bytes: 13040517 num_examples: 194 - name: theoremqa_theorems num_bytes: 360282 num_examples: 76 download_size: 5202414 dataset_size: 32314837 - config_name: llama3-70b_reason features: - name: query dtype: string - name: reasoning dtype: string - name: id dtype: string - name: excluded_ids sequence: string - name: gold_ids_long sequence: string - name: gold_ids sequence: string - name: gold_answer dtype: string splits: - name: biology num_bytes: 631132 num_examples: 103 - name: earth_science num_bytes: 622087 num_examples: 116 - name: economics num_bytes: 631657 num_examples: 103 - name: psychology num_bytes: 563987 num_examples: 101 - name: robotics num_bytes: 449342 num_examples: 101 - name: stackoverflow num_bytes: 641853 num_examples: 117 - name: sustainable_living num_bytes: 654690 num_examples: 108 - name: leetcode num_bytes: 1376032 num_examples: 142 - name: pony num_bytes: 315870 num_examples: 112 - name: aops num_bytes: 14183895 num_examples: 111 - name: theoremqa_questions num_bytes: 13157545 num_examples: 194 - name: theoremqa_theorems num_bytes: 405388 num_examples: 76 download_size: 5877403 dataset_size: 33633478 - config_name: long_documents features: - name: id dtype: string - name: content dtype: string splits: - name: biology num_bytes: 19454314 num_examples: 524 - name: earth_science num_bytes: 41843262 num_examples: 601 - name: economics num_bytes: 20095594 num_examples: 516 - name: psychology num_bytes: 20541239 num_examples: 512 - name: robotics num_bytes: 18220587 num_examples: 508 - name: stackoverflow num_bytes: 184616744 num_examples: 1858 - name: sustainable_living num_bytes: 21200303 num_examples: 554 - name: pony num_bytes: 2098474 num_examples: 577 download_size: 104578765 dataset_size: 328070517 configs: - config_name: Gemini-1.0_reason data_files: - split: biology path: Gemini-1.0_reason/biology-* - split: earth_science path: Gemini-1.0_reason/earth_science-* - split: economics path: Gemini-1.0_reason/economics-* - split: psychology path: Gemini-1.0_reason/psychology-* - split: robotics path: Gemini-1.0_reason/robotics-* - split: stackoverflow path: Gemini-1.0_reason/stackoverflow-* - split: sustainable_living path: Gemini-1.0_reason/sustainable_living-* - split: leetcode path: Gemini-1.0_reason/leetcode-* - split: pony path: Gemini-1.0_reason/pony-* - split: aops path: Gemini-1.0_reason/aops-* - split: theoremqa_questions path: Gemini-1.0_reason/theoremqa_questions-* - split: theoremqa_theorems path: Gemini-1.0_reason/theoremqa_theorems-* - config_name: claude-3-opus_reason data_files: - split: biology path: claude-3-opus_reason/biology-* - split: earth_science path: claude-3-opus_reason/earth_science-* - split: economics path: claude-3-opus_reason/economics-* - split: psychology path: claude-3-opus_reason/psychology-* - split: robotics path: claude-3-opus_reason/robotics-* - split: stackoverflow path: claude-3-opus_reason/stackoverflow-* - split: sustainable_living path: claude-3-opus_reason/sustainable_living-* - split: leetcode path: claude-3-opus_reason/leetcode-* - split: pony path: claude-3-opus_reason/pony-* - split: aops path: claude-3-opus_reason/aops-* - split: theoremqa_questions path: claude-3-opus_reason/theoremqa_questions-* - split: theoremqa_theorems path: claude-3-opus_reason/theoremqa_theorems-* - config_name: documents data_files: - split: biology path: documents/biology-* - split: earth_science path: documents/earth_science-* - split: economics path: documents/economics-* - split: psychology path: documents/psychology-* - split: robotics path: documents/robotics-* - split: stackoverflow path: documents/stackoverflow-* - split: sustainable_living path: documents/sustainable_living-* - split: pony path: documents/pony-* - split: leetcode path: documents/leetcode-* - split: aops path: documents/aops-* - split: theoremqa_theorems path: documents/theoremqa_theorems-* - split: theoremqa_questions path: documents/theoremqa_questions-* - config_name: examples data_files: - split: biology path: examples/biology-* - split: earth_science path: examples/earth_science-* - split: economics path: examples/economics-* - split: psychology path: examples/psychology-* - split: robotics path: examples/robotics-* - split: stackoverflow path: examples/stackoverflow-* - split: sustainable_living path: examples/sustainable_living-* - split: pony path: examples/pony-* - split: leetcode path: examples/leetcode-* - split: aops path: examples/aops-* - split: theoremqa_theorems path: examples/theoremqa_theorems-* - split: theoremqa_questions path: examples/theoremqa_questions-* - config_name: gpt4_reason data_files: - split: biology path: gpt4_reason/biology-* - split: earth_science path: gpt4_reason/earth_science-* - split: economics path: gpt4_reason/economics-* - split: psychology path: gpt4_reason/psychology-* - split: robotics path: gpt4_reason/robotics-* - split: stackoverflow path: gpt4_reason/stackoverflow-* - split: sustainable_living path: gpt4_reason/sustainable_living-* - split: leetcode path: gpt4_reason/leetcode-* - split: pony path: gpt4_reason/pony-* - split: aops path: gpt4_reason/aops-* - split: theoremqa_questions path: gpt4_reason/theoremqa_questions-* - split: theoremqa_theorems path: gpt4_reason/theoremqa_theorems-* - config_name: grit_reason data_files: - split: biology path: grit_reason/biology-* - split: earth_science path: grit_reason/earth_science-* - split: economics path: grit_reason/economics-* - split: psychology path: grit_reason/psychology-* - split: robotics path: grit_reason/robotics-* - split: stackoverflow path: grit_reason/stackoverflow-* - split: sustainable_living path: grit_reason/sustainable_living-* - split: leetcode path: grit_reason/leetcode-* - split: pony path: grit_reason/pony-* - split: aops path: grit_reason/aops-* - split: theoremqa_questions path: grit_reason/theoremqa_questions-* - split: theoremqa_theorems path: grit_reason/theoremqa_theorems-* - config_name: llama3-70b_reason data_files: - split: biology path: llama3-70b_reason/biology-* - split: earth_science path: llama3-70b_reason/earth_science-* - split: economics path: llama3-70b_reason/economics-* - split: psychology path: llama3-70b_reason/psychology-* - split: robotics path: llama3-70b_reason/robotics-* - split: stackoverflow path: llama3-70b_reason/stackoverflow-* - split: sustainable_living path: llama3-70b_reason/sustainable_living-* - split: leetcode path: llama3-70b_reason/leetcode-* - split: pony path: llama3-70b_reason/pony-* - split: aops path: llama3-70b_reason/aops-* - split: theoremqa_questions path: llama3-70b_reason/theoremqa_questions-* - split: theoremqa_theorems path: llama3-70b_reason/theoremqa_theorems-* - config_name: long_documents data_files: - split: biology path: long_documents/biology-* - split: earth_science path: long_documents/earth_science-* - split: economics path: long_documents/economics-* - split: psychology path: long_documents/psychology-* - split: robotics path: long_documents/robotics-* - split: stackoverflow path: long_documents/stackoverflow-* - split: sustainable_living path: long_documents/sustainable_living-* - split: pony path: long_documents/pony-* tags: - text-retrieval - code - biology - earth_science - economics - psychology - robotics - math --- # BRIGHT benchmark BRIGHT is the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. The queries are collected from diverse domains (StackExchange, LeetCode, and math competitions), all sourced from realistic human data. Experiments show that existing retrieval models perform poorly on BRIGHT, where the highest score is only 22.1 measured by nDCG@10. BRIGHT provides a good testbed for future retrieval research in more realistic and challenging settings. More details are in the [paper](https://brightbenchmark.github.io/). ## Dataset Structure We unify all the datasets with consistent formats. They are organized into three subsets, examples, documents, and long_documents: * `examples`: * `query`: the query for retrieval * `reasoning`: the gold reasoning steps annotated by humans (they help people understand the relevance between queries and documents, but are not used in any experiment in the paper) * `id`: the index of the instance * `excluded_ids`: a list of the ids (string) to exclude during evaluation (only for `theoremqa`/`aops`/`leetcode`) * `gold_ids_long`: a list of the ids (string) of the ground truth documents, corresponding to the ids of the `long_documents` subset * `gold_ids`: a list of the ids (string) of the ground truth documents, corresponding to the indices of the `documents` subset * `documents`: * `id`: the index of the document * `content`: document content (short version split from the complete web page, blogs, etc., or a problem and solution pair) * `long_documents` (not applicable to `theoremqa`/`aops`/`leetcode`): * `id`: the index of the document * `content`: document content (long version corresponding to the complete web page, blogs, etc.) ## Dataset Statistics <img src="statistics.png" width="80%" alt="BRIGHT statistics"> ## Data Loading Each dataset can be easily loaded. For example, to load biology examples: ``` from datasets import load_dataset data = load_dataset('xlangai/BRIGHT', 'examples')['biology'] ``` ## Citation If you find our work helpful, please cite us: ```citation @misc{BRIGHT, title={BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval}, author={Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and Sun, Ruoxi and Yoon, Jinsung and Arik, Sercan O and Chen, Danqi and Yu, Tao}, url={https://arxiv.org/abs/2407.12883}, year={2024}, } ```

提供机构：

mteb

搜集汇总

数据集介绍

构建方式

在信息检索领域，面对日益复杂的查询需求，BRIGHT基准的构建旨在模拟真实场景中需要深度推理的检索任务。该数据集通过整合来自StackExchange、LeetCode以及数学竞赛等多个领域的真实人类查询数据，精心构建了查询与文档之间的关联。其核心结构分为三个子集：examples子集包含查询、人工标注的推理步骤及标准答案标识；documents子集提供经过分割的短版本文档内容；long_documents子集则保留了完整的原始网页或博客长文本，共同形成了一个层次分明的检索评估体系。

特点

BRIGHT基准的显著特征在于其强调推理密集型检索，突破了传统检索任务对表面语义匹配的依赖。数据集覆盖生物学、地球科学、经济学、心理学、机器人学等十二个专业领域，并包含编程与数学竞赛等复杂场景，体现了高度的多样性与专业性。每个查询均配有详尽的推理步骤标注，揭示了查询与相关文档之间的深层逻辑联系，为评估模型在复杂语境下的理解能力提供了严谨的测试框架。

使用方法

为便于研究者使用，BRIGHT数据集通过Hugging Face的datasets库提供了便捷的加载接口。用户可根据需要加载特定的配置与领域子集，例如通过指定配置名'examples'和分割名'biology'来获取生物学领域的查询实例。数据集中明确的标识字段，如gold_ids和excluded_ids，为设计检索实验与评估指标（如nDCG@10）提供了清晰的指引，使得该基准能够直接应用于训练与测试前沿的检索模型。

背景与挑战

背景概述

在信息检索领域，传统模型通常依赖于关键词匹配或浅层语义关联，难以应对需要深度逻辑推理的复杂查询。为应对这一挑战，研究团队于2024年推出了BRIGHT基准数据集，该数据集由多领域专家共同构建，旨在评估模型在需要密集推理的文本检索任务中的性能。BRIGHT的核心研究问题聚焦于如何提升检索系统在真实且复杂的跨学科场景下的理解与推理能力，其涵盖生物学、地球科学、经济学、心理学、机器人学以及数学竞赛等多个专业领域，为推进检索技术向更高层次的认知理解迈进提供了关键的评估工具。

当前挑战

BRIGHT数据集所针对的领域挑战在于，现有检索模型在面对需要多步逻辑推理和深层领域知识的查询时表现显著不足，例如在解决数学证明或专业代码问题时，模型难以准确关联查询与相关文档。在构建过程中，挑战主要源于如何从StackExchange、LeetCode等真实人类数据源中收集和标注高质量、多样化的查询与文档对，同时确保推理步骤的准确性和领域覆盖的全面性，这需要精细的领域知识整合与大规模数据清洗工作。

常用场景

经典使用场景

在信息检索领域，BRIGHT数据集为评估检索模型在复杂推理任务上的性能提供了经典场景。该数据集包含来自生物学、地球科学、经济学、心理学、机器人学以及StackOverflow、LeetCode和数学竞赛等多个专业领域的查询与文档，每个查询均需经过深度推理才能确定相关文档。研究人员通常利用该数据集测试检索模型在处理需要多步逻辑分析和领域知识整合的查询时的表现，特别是在面对长文档和跨领域内容时的检索精度与鲁棒性。

衍生相关工作

围绕BRIGHT数据集，已衍生出多项关注推理增强检索的经典研究工作。这些工作主要探索如何将大型语言模型的推理能力与检索系统相结合，例如通过生成式检索或检索增强生成技术来改善复杂查询下的文档相关性判断。同时，一些研究利用BRIGHT的多领域特性，开发了跨领域自适应检索方法，以提升模型在专业场景中的泛化能力。这些工作共同推动了检索技术向更高层次的认知智能迈进。

数据集最近研究