PopQA

Name: PopQA
Creator: Authors of the paper
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/alextmallen/adaptive-retrieval

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个大规模的开域问答数据集，包含14,000个问题，主要关注事实性知识，并且专门设计用来从Wikidata中抽取知识三元组。与其他基准测试如Natural Questions相比，PopQA在构建时更侧重于从长尾部分抽样，并且包含大量低流行度实体。该数据集的任务是开域问答，其规模呈现实体长尾分布的特点。

This dataset is a large-scale open-domain question answering (QA) dataset comprising 14,000 questions, primarily focusing on factual knowledge and specifically designed to extract knowledge triples from Wikidata. Compared with other benchmarks such as Natural Questions, PopQA places greater emphasis on sampling from the long-tail portion during construction and includes a large number of low-popularity entities. The task targeted by this dataset is open-domain QA, and it exhibits the characteristic of a long-tail distribution of entities.

提供机构：

Authors of the paper

搜集汇总

数据集介绍

背景与挑战

背景概述

PopQA是一个包含14k问答对的开放领域QA数据集，附带Wikidata实体ID和页面浏览量等信息，用于研究语言模型的知识检索与增强。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集