yourbench_example_mc_misleading_small_2

Hugging Face2025-07-21 更新2025-07-22 收录

下载链接：

https://huggingface.co/datasets/philipp219/yourbench_example_mc_misleading_small_2

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个配置，每个配置根据其特点存储了不同的文本信息。chunked配置存储了文档的文本、文件名、元数据以及摘要信息；ingested配置存储了文档的基本信息；lighteval配置包含了问题生成相关的信息；single_shot_questions配置包含了单轮问题回答的相关信息；summarized配置则存储了文档的摘要信息。各配置均提供了训练集数据。

This dataset includes multiple configurations, each storing distinct text information tailored to its specific characteristics. The chunked configuration stores the text content, filenames, metadata, and summary information of documents; the ingested configuration stores basic document information; the lighteval configuration contains information related to question generation; the single_shot_questions configuration holds information related to single-turn question answering; and the summarized configuration only stores the summary information of documents. Training dataset splits are provided for all these configurations.

创建时间：

2025-07-18

原始信息汇总

数据集概述

基本信息

数据集名称: yourbench_example_mc_misleading_small_2
数据集地址: https://huggingface.co/datasets/philipp219/yourbench_example_mc_misleading_small_2

数据集配置

数据集包含以下5种配置：

1. chunked

特征:
- document_id (string)
- document_text (string)
- document_filename (string)
- document_metadata (struct: file_size (int64))
- raw_chunk_summaries (sequence: string)
- chunk_summaries (sequence: string)
- raw_document_summary (string)
- document_summary (string)
- summarization_model (string)
- chunks (list: chunk_id (string), chunk_text (string))
- multihop_chunks (list: chunk_ids (sequence: string), chunks_text (sequence: string))
数据量:
- train: 3个示例，95,953字节
下载大小: 63,437字节
数据集大小: 95,953字节

2. ingested

特征:
- document_id (string)
- document_text (string)
- document_filename (string)
- document_metadata (struct: file_size (int64))
数据量:
- train: 3个示例，32,381字节
下载大小: 12,867字节
数据集大小: 32,381字节

3. lighteval

特征:
- question (string)
- additional_instructions (string)
- ground_truth_answer (string)
- gold (sequence: int64)
- choices (sequence: string)
- question_category (string)
- kind (string)
- estimated_difficulty (int64)
- citations (sequence: string)
- document_id (string)
- chunk_ids (sequence: string)
- question_generating_model (string)
- chunks (sequence: string)
- document (string)
- document_summary (string)
数据量:
- train: 8个示例，93,474字节
下载大小: 28,764字节
数据集大小: 93,474字节

4. single_shot_questions

特征:
- document_id (string)
- additional_instructions (string)
- question (string)
- self_answer (string)
- estimated_difficulty (int64)
- self_assessed_question_type (string)
- generating_model (string)
- thought_process (string)
- raw_response (string)
- citations (sequence: string)
- original_question (null)
- question_rewriting_model (null)
- question_rewriting_rationale (null)
- raw_question_rewriting_response (null)
- choices (sequence: string)
- chunk_id (string)
数据量:
- train: 10个示例，47,162字节
下载大小: 24,842字节
数据集大小: 47,162字节

5. summarized

特征:
- document_id (string)
- document_text (string)
- document_filename (string)
- document_metadata (struct: file_size (int64))
- raw_chunk_summaries (sequence: string)
- chunk_summaries (sequence: string)
- raw_document_summary (string)
- document_summary (string)
- summarization_model (string)
数据量:
- train: 3个示例，41,748字节
下载大小: 32,893字节
数据集大小: 41,748字节

搜集汇总

数据集介绍

构建方式

该数据集采用多阶段结构化处理流程构建，原始文档经过分块处理生成语义连贯的文本片段，并运用摘要模型生成多层次摘要信息。通过配置不同的处理模式（chunked/ingested/lighteval等），分别保留文档分块、摘要生成、问题构建等不同阶段的中间产物，形成具有完整知识链条的数据结构。每个处理阶段均记录模型来源和元数据，确保数据处理过程的可追溯性。

特点

数据集最显著的特征在于其多维度的知识表示体系，既包含原始文档和分块文本，又整合了自动生成的摘要、多跳问题及参考答案。lighteval配置特别设计了包含难度评估、问题类别标注的问答对，而single_shot_questions配置则保留了问题生成模型的思维链数据。这种立体化的数据结构为研究文档理解、问答系统等任务提供了丰富的监督信号。

使用方法

使用者可根据研究目标选择不同配置：文档分析任务推荐使用chunked或summarized配置获取结构化文本；问答系统开发可调用lighteval配置的标注问题；模型推理能力研究则适合采用single_shot_questions中的思维链数据。所有数据均通过标准化字段访问，如document_text获取原文，chunks访问分块内容，实现不同粒度信息的灵活调用。

背景与挑战

背景概述

yourbench_example_mc_misleading_small_2数据集是一个专注于多跳推理与文本摘要评估的基准测试集合，其设计初衷在于解决复杂问答系统中信息整合与误导性内容识别的核心问题。该数据集由前沿研究团队构建，通过结构化文档、分块摘要及多跳问题链等创新形式，为自然语言处理领域提供了评估模型深层理解能力的标准化工具。其多配置架构涵盖从原始文本到问题生成的完整流程，显著推动了机器阅读理解与推理任务的研究进程。

当前挑战

该数据集面临的核心挑战体现在两个维度：在领域问题层面，如何准确评估模型对分散信息的关联能力及对隐蔽误导内容的辨识度，这要求设计具有认知复杂度的多跳问题；在构建过程中，确保文档分块与摘要的语义连贯性，同时维持问题与答案对间的逻辑严谨性，涉及复杂的质量控制机制。多配置间的数据一致性维护以及生成式问题的人工验证，均为构建过程中的技术难点。

常用场景

经典使用场景

在自然语言处理领域，yourbench_example_mc_misleading_small_2数据集以其多跳推理和误导性问题的特性，成为评估模型逻辑推理能力的经典工具。研究者通过该数据集中的多跳问题和干扰选项，能够系统测试模型在复杂语境下的信息整合与干扰排除能力，尤其适合用于验证问答系统和阅读理解模型的鲁棒性。

实际应用

在实际应用中，该数据集已被广泛应用于智能客服系统的抗干扰训练、法律文书分析的逻辑验证等场景。其包含的多层次文本摘要和问题生成功能，特别适合用于构建需要处理复杂文档的行业解决方案，如医疗报告解析和金融合同审查等专业领域。

衍生相关工作

基于该数据集衍生的研究已产生显著影响，包括提出了新型的多跳注意力机制、改进了预训练模型的抗干扰能力评估框架等。相关成果发表在ACL、EMNLP等顶级会议，推动了如HotpotQA等后续基准数据集的构建方法论。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集