Cyro1/popularity-enriched-qa-datasets
收藏Hugging Face2026-02-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Cyro1/popularity-enriched-qa-datasets
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: fever
features:
- name: question_id
dtype: string
- name: question_text
dtype: string
- name: answer_texts
sequence: string
- name: wikipedia_id
dtype: int64
- name: wikipedia_title
dtype: string
- name: popularity_avg
dtype: float64
- name: popularity_rank
dtype: float64
splits:
- name: train
num_bytes: 10958725
num_examples: 84930
- name: test
num_bytes: 1215428
num_examples: 9437
download_size: 7358091
dataset_size: 12174153
- config_name: hotpot_qa
features:
- name: question_id
dtype: string
- name: question_text
dtype: string
- name: answer_texts
sequence: string
- name: wikipedia_id
dtype: int64
- name: wikipedia_title
dtype: string
- name: popularity_avg
dtype: float64
- name: popularity_rank
dtype: float64
splits:
- name: train
num_bytes: 27140743
num_examples: 133572
- name: test
num_bytes: 3028768
num_examples: 14842
download_size: 21788251
dataset_size: 30169511
- config_name: natural_questions
features:
- name: question_id
dtype: string
- name: question_text
dtype: string
- name: answer_texts
sequence: string
- name: wikipedia_id
dtype: int64
- name: wikipedia_title
dtype: string
- name: popularity_avg
dtype: float64
- name: popularity_rank
dtype: float64
splits:
- name: train
num_bytes: 11178574
num_examples: 73379
- name: test
num_bytes: 1247548
num_examples: 8154
download_size: 9303943
dataset_size: 12426122
- config_name: pop_qa
features:
- name: question_id
dtype: string
- name: question_text
dtype: string
- name: answer_texts
sequence: string
- name: wikipedia_id
dtype: int64
- name: wikipedia_title
dtype: string
- name: popularity_avg
dtype: float64
- name: popularity_rank
dtype: float64
splits:
- name: train
num_bytes: 1448859
num_examples: 12429
- name: test
num_bytes: 162856
num_examples: 1382
download_size: 1097273
dataset_size: 1611715
- config_name: trex
features:
- name: question_id
dtype: string
- name: question_text
dtype: string
- name: answer_texts
sequence: string
- name: wikipedia_id
dtype: int64
- name: wikipedia_title
dtype: string
- name: popularity_avg
dtype: float64
- name: popularity_rank
dtype: float64
splits:
- name: train
num_bytes: 666361008
num_examples: 2555194
- name: test
num_bytes: 74111821
num_examples: 283911
download_size: 390422202
dataset_size: 740472829
- config_name: trivia_qa
features:
- name: question_id
dtype: string
- name: question_text
dtype: string
- name: answer_texts
sequence: string
- name: wikipedia_id
dtype: int64
- name: wikipedia_title
dtype: string
- name: popularity_avg
dtype: float64
- name: popularity_rank
dtype: float64
splits:
- name: train
num_bytes: 66384838
num_examples: 88547
- name: test
num_bytes: 7312488
num_examples: 9839
download_size: 40802318
dataset_size: 73697326
configs:
- config_name: fever
data_files:
- split: train
path: fever/train-*
- split: test
path: fever/test-*
- config_name: hotpot_qa
data_files:
- split: train
path: hotpot_qa/train-*
- split: test
path: hotpot_qa/test-*
- config_name: natural_questions
data_files:
- split: train
path: natural_questions/train-*
- split: test
path: natural_questions/test-*
- config_name: pop_qa
data_files:
- split: train
path: pop_qa/train-*
- split: test
path: pop_qa/test-*
- config_name: trex
data_files:
- split: train
path: trex/train-*
- split: test
path: trex/test-*
- config_name: trivia_qa
data_files:
- split: train
path: trivia_qa/train-*
- split: test
path: trivia_qa/test-*
---
# Popularity-Enriched QA Datasets
This dataset repo hosts popularity-enriched versions of PopQA, Natural Questions, and TriviaQA.
Each subset retains the enrichment schema produced by this notebook (question + pron and popularity metrics).
## Subsets
- `pop_qa`: Popularity-enriched PopQA test split
- `natural_questions`: Wikipedia-provenance Natural Questions validation set
- `trivia_qa`: TriviaQA validation subset matched to KILT and original TriviaQA IDs
- `hotpot_qa`: HotPotQA validation subset matched to KILT and original HotPotQA IDs
## Schema
- `question_id`: question identifier
- `question_text`: raw question text
- `answer_texts`: list of candidate answers
- `wikipedia_id`: Wikipedia provenance page id
- `wikipedia_title`: page title
- `popularity_avg`: average monthly pageviews
- `popularity_rank`: rank derived from the popularity source
## Loading
Use `datasets.load_dataset("Cyro1/popularity-enriched-qa-datasets", split="popqa")` to stream the PopQA subset and swap `split` for each subset name.
提供机构:
Cyro1



