NiklasPythonstein/injected-datasets

Name: NiklasPythonstein/injected-datasets
Creator: NiklasPythonstein
Published: 2026-04-25 15:38:19
License: 暂无描述

Hugging Face2026-04-25 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/NiklasPythonstein/injected-datasets

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个用于阅读理解或多选题任务的数据集，包含七个不同的配置，每个配置对应一种特定的缺陷类型：double_key（双关键）、incorrect_key（错误关键）、missing_critical_span（缺失关键跨度）、missing_key（缺失关键）、unanswerable_question（无法回答的问题）、unreasonable_distractors（不合理干扰项）和verbatim_key（逐字关键）。每个样本包括标题、段落、难度级别（分为高级、中级、初级）、问题、段落索引、四个答案选项、a_span和d_span列表、关键索引、标签、缺陷类型和唯一ID。数据集旨在模拟缺陷问题生成场景，用于评估模型在处理不同缺陷类型时的性能。数据集分为训练集（364个样本）、开发集（57个样本）和测试集（55个样本），每个配置的样本数量相同。

This dataset is designed for reading comprehension or multiple-choice question tasks, comprising seven distinct configurations, each corresponding to a specific flaw type: double_key, incorrect_key, missing_critical_span, missing_key, unanswerable_question, unreasonable_distractors, and verbatim_key. Each sample includes a title, paragraph, difficulty level (categorized as Advanced, Intermediate, Elementary), question, paragraph index, a list of four answer options, a_span and d_span lists, key index, label, flaw type, and unique ID. The dataset aims to simulate flawed question generation scenarios for evaluating model performance under various defect types. It is split into training (364 examples), development (57 examples), and test sets (55 examples), with consistent sample sizes across all configurations.

提供机构：

NiklasPythonstein

5,000+

优质数据集

54 个

任务类型

进入经典数据集