ibm-research/popqa-tp

Name: ibm-research/popqa-tp
Creator: ibm-research
Published: 2023-10-31 17:45:29
License: 暂无描述

Hugging Face2023-10-31 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/ibm-research/popqa-tp

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit --- # Dataset Card for "popqa-tp" ### Dataset Summary PopQA-TP (PopQA Templated Paraphrases) is a dataset derived from PopQA (https://huggingface.co/datasets/akariasai/PopQA), created for the paper "Predicting Question-Answering Performance of Large Language Models through Semantic Consistency". PopQA-TP takes each question in PopQA and paraphrases it using each of several manually-created templates specific to each question category. The paper investigates the relationship between the semantic consistency of generated answers to each question's paraphrases and the accuracy (correctness) of the generated answer to the original question, evaluated by string match to one of the ground truth answers. PopQA-TP can be used as a benchmark dataset for evaluating the semantic consistency of LLMs in the context of factiod question-answering (QA). ### Data Instances #### popqa-tp - **Size of downloaded dataset file:** 15.4 MB ### Data Fields #### popqa-tp - `paraphrase` (string): paraphrase of question from PopQA. - `prop` (string): relationship type category of question. - `template_id` (integer): integer ID of the paraphrase template used to create `paraphrase`. Value of 0 indicates it is the original question form from PopQA. - `possible_answers` (list of strings): a list of the gold answers. - `id` (integer): original ID of question from PopQA ### Citation Information ``` @inproceedings{rabinovich2023predicting, title={Predicting Question-Answering Performance of Large Language Models Through Semantic Consistency}, author={Ella Rabinovich, Samuel Ackerman, Orna Raz, Eitan Farchi, Ateret Anaby-Tavor}, booktitle = "Proceedings of the 3rd Version of the Generation, Evaluation & Metrics (GEM) Workshop of The 2023 Conference on Empirical Methods in Natural Language Processing", publisher = "Association for Computational Linguistics", year={2023},} } ```

--- 许可证：MIT许可证 --- # “popqa-tp”数据集卡片 ### 数据集摘要 PopQA-TP（即模板化复述版PopQA，PopQA Templated Paraphrases）是源自PopQA（https://huggingface.co/datasets/akariasai/PopQA）的数据集，为论文《通过语义一致性预测大语言模型的问答性能》（Predicting Question-Answering Performance of Large Language Models through Semantic Consistency）所构建。该数据集对PopQA中的每个问题，针对每个问题类别使用若干人工构建的专属模板进行复述改写。本论文探究了针对每个问题的复述版本所生成答案的语义一致性，与针对原问题所生成答案的准确率（正确性）之间的关联，其中准确率通过与任一标准答案的字符串匹配结果进行评估。PopQA-TP可作为基准数据集，用于评估大语言模型（Large Language Model，LLM）在事实型问答场景下的语义一致性。 ### 数据实例 #### popqa-tp - **下载后的数据集文件大小：** 15.4 MB ### 数据字段 #### popqa-tp - `paraphrase`（字符串类型）：PopQA中原问题的复述版本。 - `prop`（字符串类型）：问题的关系类型类别。 - `template_id`（整数类型）：用于生成`paraphrase`的复述模板的整数ID。当该值为0时，表示该条目为PopQA中的原始问题形式。 - `possible_answers`（字符串列表类型）：标准答案的列表。 - `id`（整数类型）：PopQA中原问题的原始ID。 ### 引用信息 @inproceedings{rabinovich2023predicting, title={Predicting Question-Answering Performance of Large Language Models Through Semantic Consistency}, author={Ella Rabinovich, Samuel Ackerman, Orna Raz, Eitan Farchi, Ateret Anaby-Tavor}, booktitle = "Proceedings of the 3rd Version of the Generation, Evaluation & Metrics (GEM) Workshop of The 2023 Conference on Empirical Methods in Natural Language Processing", publisher = "Association for Computational Linguistics", year={2023},} }

提供机构：

ibm-research

5,000+

优质数据集

54 个

任务类型

进入经典数据集