Cyro1/popularity-enriched-qa-datasets

Name: Cyro1/popularity-enriched-qa-datasets
Creator: Cyro1
Published: 2026-02-26 00:30:33
License: 暂无描述

Hugging Face2026-02-26 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Cyro1/popularity-enriched-qa-datasets

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: fever features: - name: question_id dtype: string - name: question_text dtype: string - name: answer_texts sequence: string - name: wikipedia_id dtype: int64 - name: wikipedia_title dtype: string - name: popularity_avg dtype: float64 - name: popularity_rank dtype: float64 splits: - name: train num_bytes: 10958725 num_examples: 84930 - name: test num_bytes: 1215428 num_examples: 9437 download_size: 7358091 dataset_size: 12174153 - config_name: hotpot_qa features: - name: question_id dtype: string - name: question_text dtype: string - name: answer_texts sequence: string - name: wikipedia_id dtype: int64 - name: wikipedia_title dtype: string - name: popularity_avg dtype: float64 - name: popularity_rank dtype: float64 splits: - name: train num_bytes: 27140743 num_examples: 133572 - name: test num_bytes: 3028768 num_examples: 14842 download_size: 21788251 dataset_size: 30169511 - config_name: natural_questions features: - name: question_id dtype: string - name: question_text dtype: string - name: answer_texts sequence: string - name: wikipedia_id dtype: int64 - name: wikipedia_title dtype: string - name: popularity_avg dtype: float64 - name: popularity_rank dtype: float64 splits: - name: train num_bytes: 11178574 num_examples: 73379 - name: test num_bytes: 1247548 num_examples: 8154 download_size: 9303943 dataset_size: 12426122 - config_name: pop_qa features: - name: question_id dtype: string - name: question_text dtype: string - name: answer_texts sequence: string - name: wikipedia_id dtype: int64 - name: wikipedia_title dtype: string - name: popularity_avg dtype: float64 - name: popularity_rank dtype: float64 splits: - name: train num_bytes: 1448859 num_examples: 12429 - name: test num_bytes: 162856 num_examples: 1382 download_size: 1097273 dataset_size: 1611715 - config_name: trex features: - name: question_id dtype: string - name: question_text dtype: string - name: answer_texts sequence: string - name: wikipedia_id dtype: int64 - name: wikipedia_title dtype: string - name: popularity_avg dtype: float64 - name: popularity_rank dtype: float64 splits: - name: train num_bytes: 666361008 num_examples: 2555194 - name: test num_bytes: 74111821 num_examples: 283911 download_size: 390422202 dataset_size: 740472829 - config_name: trivia_qa features: - name: question_id dtype: string - name: question_text dtype: string - name: answer_texts sequence: string - name: wikipedia_id dtype: int64 - name: wikipedia_title dtype: string - name: popularity_avg dtype: float64 - name: popularity_rank dtype: float64 splits: - name: train num_bytes: 66384838 num_examples: 88547 - name: test num_bytes: 7312488 num_examples: 9839 download_size: 40802318 dataset_size: 73697326 configs: - config_name: fever data_files: - split: train path: fever/train-* - split: test path: fever/test-* - config_name: hotpot_qa data_files: - split: train path: hotpot_qa/train-* - split: test path: hotpot_qa/test-* - config_name: natural_questions data_files: - split: train path: natural_questions/train-* - split: test path: natural_questions/test-* - config_name: pop_qa data_files: - split: train path: pop_qa/train-* - split: test path: pop_qa/test-* - config_name: trex data_files: - split: train path: trex/train-* - split: test path: trex/test-* - config_name: trivia_qa data_files: - split: train path: trivia_qa/train-* - split: test path: trivia_qa/test-* --- # Popularity-Enriched QA Datasets This dataset repo hosts popularity-enriched versions of PopQA, Natural Questions, and TriviaQA. Each subset retains the enrichment schema produced by this notebook (question + pron and popularity metrics). ## Subsets - `pop_qa`: Popularity-enriched PopQA test split - `natural_questions`: Wikipedia-provenance Natural Questions validation set - `trivia_qa`: TriviaQA validation subset matched to KILT and original TriviaQA IDs - `hotpot_qa`: HotPotQA validation subset matched to KILT and original HotPotQA IDs ## Schema - `question_id`: question identifier - `question_text`: raw question text - `answer_texts`: list of candidate answers - `wikipedia_id`: Wikipedia provenance page id - `wikipedia_title`: page title - `popularity_avg`: average monthly pageviews - `popularity_rank`: rank derived from the popularity source ## Loading Use `datasets.load_dataset("Cyro1/popularity-enriched-qa-datasets", split="popqa")` to stream the PopQA subset and swap `split` for each subset name.

提供机构：

Cyro1

5,000+

优质数据集

54 个

任务类型

进入经典数据集