PopQA_robustness

Name: PopQA_robustness
Creator: maas
Published: 2025-11-27 16:50:57
License: 暂无描述

魔搭社区2025-11-27 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/ibm-research/PopQA_robustness

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for "PopQA-robustness" ### Dataset Summary PopQS-robustness is an expanded version of the PopQA dataset (https://aclanthology.org/2023.acl-long.546/) but with perturbations of the original input questions. It is intended for use as a benchmark for evaluating model robustness on question-answering to these perturbations. ### Data Instances #### popqa_robustness - **Size of downloaded dataset file:** 26.4 MB ### Data Fields #### boolq_robustness - `id` (integer): original question grouping ID - `question` (string): variant of question from BoolQ. - `variant_id` (integer): identifier of the variant. 0 indicates it is the original unperturbed question. - `variant_type` (string): name of the expansion variant type. "original" is the original question; "simple" is a superficial non-semantic perturbation; "paraphrase" is a semantic paraphrase of the question. - `possible_answers` (string): list of strings of possible answers. ### Citation Information ``` @misc{ackerman2024novelmetricmeasuringrobustness, title={A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios}, author={Samuel Ackerman and Ella Rabinovich and Eitan Farchi and Ateret Anaby-Tavor}, year={2024}, eprint={2408.01963}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.01963}, } ```

# “PopQA-鲁棒性”数据集卡片 ### 数据集概述 PopQA-鲁棒性（原文此处为PopQS-robustness，疑似笔误，与后续数据集实例名`popqa_robustness`一致）是PopQA数据集（https://aclanthology.org/2023.acl-long.546/）的扩展版本，针对原始输入问句添加了各类扰动变换。本数据集旨在作为基准测试集，用于评估大语言模型（Large Language Model, LLM）在应对此类问句扰动时的问答鲁棒性。 ### 数据实例 #### popqa_robustness - **数据集文件下载大小：** 26.4 MB ### 数据字段 #### boolq_robustness - `id`（整数型）：原始问句分组标识符 - `question`（字符串型）：源自BoolQ的问句变体 - `variant_id`（整数型）：变体标识符，取值为0时代表未受扰动的原始问句 - `variant_type`（字符串型）：扩展变体类型名称，其中“original”表示原始问句；“simple”表示表层非语义扰动；“paraphrase”表示问句的语义复述变体 - `possible_answers`（字符串型）：候选答案字符串列表 ### 引用信息 @misc{ackerman2024novelmetricmeasuringrobustness, title={非对抗场景下大语言模型鲁棒性评测的新型度量指标}, author={塞缪尔·阿克曼、埃拉·拉宾诺维奇、埃坦·法尔奇、阿泰雷特·阿纳比-塔沃尔}, year={2024}, eprint={2408.01963}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.01963}, }

提供机构：

maas

创建时间：

2025-10-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集