five

PopQA_robustness

收藏
魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/ibm-research/PopQA_robustness
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for "PopQA-robustness" ### Dataset Summary PopQS-robustness is an expanded version of the PopQA dataset (https://aclanthology.org/2023.acl-long.546/) but with perturbations of the original input questions. It is intended for use as a benchmark for evaluating model robustness on question-answering to these perturbations. ### Data Instances #### popqa_robustness - **Size of downloaded dataset file:** 26.4 MB ### Data Fields #### boolq_robustness - `id` (integer): original question grouping ID - `question` (string): variant of question from BoolQ. - `variant_id` (integer): identifier of the variant. 0 indicates it is the original unperturbed question. - `variant_type` (string): name of the expansion variant type. "original" is the original question; "simple" is a superficial non-semantic perturbation; "paraphrase" is a semantic paraphrase of the question. - `possible_answers` (string): list of strings of possible answers. ### Citation Information ``` @misc{ackerman2024novelmetricmeasuringrobustness, title={A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios}, author={Samuel Ackerman and Ella Rabinovich and Eitan Farchi and Ateret Anaby-Tavor}, year={2024}, eprint={2408.01963}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.01963}, } ```

# “PopQA-鲁棒性”数据集卡片 ### 数据集概述 PopQA-鲁棒性(原文此处为PopQS-robustness,疑似笔误,与后续数据集实例名`popqa_robustness`一致)是PopQA数据集(https://aclanthology.org/2023.acl-long.546/)的扩展版本,针对原始输入问句添加了各类扰动变换。本数据集旨在作为基准测试集,用于评估大语言模型(Large Language Model, LLM)在应对此类问句扰动时的问答鲁棒性。 ### 数据实例 #### popqa_robustness - **数据集文件下载大小:** 26.4 MB ### 数据字段 #### boolq_robustness - `id`(整数型):原始问句分组标识符 - `question`(字符串型):源自BoolQ的问句变体 - `variant_id`(整数型):变体标识符,取值为0时代表未受扰动的原始问句 - `variant_type`(字符串型):扩展变体类型名称,其中“original”表示原始问句;“simple”表示表层非语义扰动;“paraphrase”表示问句的语义复述变体 - `possible_answers`(字符串型):候选答案字符串列表 ### 引用信息 @misc{ackerman2024novelmetricmeasuringrobustness, title={非对抗场景下大语言模型鲁棒性评测的新型度量指标}, author={塞缪尔·阿克曼、埃拉·拉宾诺维奇、埃坦·法尔奇、阿泰雷特·阿纳比-塔沃尔}, year={2024}, eprint={2408.01963}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.01963}, }
提供机构:
maas
创建时间:
2025-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作