community-datasets/qanta
收藏数据集卡片 for "qanta"
数据集描述
数据集摘要
Qanta 数据集是一个基于学术 trivia 游戏 Quizbowl 的问答数据集。
支持的任务和排行榜
语言
数据集结构
数据实例
mode=first,char_skip=25
- 下载的数据集文件大小: 170.75 MB
- 生成的数据集大小: 147.18 MB
- 磁盘使用总量: 317.93 MB
一个 guessdev 的例子如下:
json { "answer": "Apollo_program", "category": "History", "char_idx": -1, "dataset": "quizdb.org", "difficulty": "easy_college", "first_sentence": "As part of this program, William Anders took a photo that Galen Rowell called "the most influential environmental photograph ever taken."", "fold": "guessdev", "full_question": ""As part of this program, William Anders took a photo that Galen Rowell called "the most influential environmental photograph e...", "gameplay": false, "id": "127028-first", "page": "Apollo_program", "proto_id": "", "qanta_id": 127028, "qdb_id": 126689, "raw_answer": "Apollo program [or Project Apollo; accept Apollo 8; accept Apollo 1; accept Apollo 11; prompt on landing on the moon]", "sentence_idx": -1, "subcategory": "American", "text": "As part of this program, William Anders took a photo that Galen Rowell called "the most influential environmental photograph ever taken."", "tokenizations": [[0, 137], [138, 281], [282, 412], [413, 592], [593, 675]], "tournament": "ACF Fall", "year": 2016 }
数据字段
所有分割的数据字段相同。
mode=first,char_skip=25
id: 一个string特征。qanta_id: 一个int32特征。proto_id: 一个string特征。qdb_id: 一个int32特征。dataset: 一个string特征。text: 一个string特征。full_question: 一个string特征。first_sentence: 一个string特征。char_idx: 一个int32特征。sentence_idx: 一个int32特征。tokenizations: 一个包含以下内容的字典特征:feature: 一个int32特征。
answer: 一个string特征。page: 一个string特征。raw_answer: 一个string特征。fold: 一个string特征。gameplay: 一个bool特征。category: 一个string特征。subcategory: 一个string特征。tournament: 一个string特征。difficulty: 一个string特征。year: 一个int32特征。
数据分割
| name | adversarial | buzzdev | buzztrain | guessdev | guesstrain | buzztest | guesstest |
|---|---|---|---|---|---|---|---|
| mode=first,char_skip=25 | 1145 | 1161 | 16706 | 1055 | 96221 | 1953 | 2151 |




