amounts-tidings/Country-city-animals
收藏Hugging Face2024-06-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/amounts-tidings/Country-city-animals
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
task_categories:
- question-answering
dataset_info:
- config_name: Corpus_narrative
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 18886
num_examples: 360
download_size: 6208
dataset_size: 18886
- config_name: Corpus_referencing
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 64678
num_examples: 480
download_size: 14874
dataset_size: 64678
- config_name: Eval_2hop_reasoning
features:
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: test
num_bytes: 1448
num_examples: 20
download_size: 2142
dataset_size: 1448
- config_name: Eval_QA
features:
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: test
num_bytes: 2134
num_examples: 40
download_size: 2532
dataset_size: 2134
- config_name: Eval_animal_commonsense
features:
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: test
num_bytes: 7254
num_examples: 100
download_size: 4166
dataset_size: 7254
- config_name: Eval_indirect_reasoning
features:
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: test
num_bytes: 15241
num_examples: 100
download_size: 4966
dataset_size: 15241
- config_name: Eval_multiple_choice
features:
- name: question
dtype: string
- name: choices
sequence: string
- name: answer
dtype: int64
splits:
- name: test
num_bytes: 16193
num_examples: 160
download_size: 3818
dataset_size: 16193
- config_name: Eval_reverse
features:
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: test
num_bytes: 2394
num_examples: 40
download_size: 2631
dataset_size: 2394
- config_name: Facts
features:
- name: head
dtype: string
- name: relation
dtype: string
- name: tail
dtype: string
splits:
- name: train
num_bytes: 1614
num_examples: 40
download_size: 2707
dataset_size: 1614
configs:
- config_name: Corpus_narrative
data_files:
- split: train
path: Corpus_narrative/train-*
- config_name: Corpus_referencing
data_files:
- split: train
path: Corpus_referencing/train-*
- config_name: Eval_2hop_reasoning
data_files:
- split: test
path: Eval_2hop_reasoning/test-*
- config_name: Eval_QA
data_files:
- split: test
path: Eval_QA/test-*
- config_name: Eval_animal_commonsense
data_files:
- split: test
path: Eval_animal_commonsense/test-*
- config_name: Eval_indirect_reasoning
data_files:
- split: test
path: Eval_indirect_reasoning/test-*
- config_name: Eval_multiple_choice
data_files:
- split: test
path: Eval_multiple_choice/test-*
- config_name: Eval_reverse
data_files:
- split: test
path: Eval_reverse/test-*
- config_name: Facts
data_files:
- split: train
path: Facts/train-*
---
# Country-city-animals: a dataset of synthetic facts, with corresponding corpora and reasoning tasks
Country-city-animals is a dataset of **simple synthetic facts** about countries, cities, and animals. The facts are provided in both triplet form and in text form, and can be used to train or finetune language models for **studying knowledge learning from text**. A variety of reasoning tasks are also provided to **evaluate whether a model has learned the facts and can generalize them in reasoning tasks** from easy to difficult.
- **Paper:** [Link pending]
### Facts
This subset contains the facts in triplet form. All other subsets are derived from this one.
- **Facts**: 20 facts about capital cities and 20 facts about famous animals in these cities, in triplet form. For example:
- *(Andoria, capital_city, Copperton)*
- *(Copperton, famous_for, lion)*
### Corpora
Two kinds of text corpora are provided based on the facts: *Narrative* and *Referencing*.
- **Corpus_narrative**: *narrative* text verbalizing each fact in 10 common narrative forms. For example:
- *The capital city of \{country\} is \{city\}.*
- *\{city\} is the capital of \{country\}.*
- *{country\}'s capital city is \{city\}.*
- **Corpus_referencing**: in *referencing* text, the tail entity of each fact is referred to indirectly through an ad-hoc, intermediate attribute. The ad-hoc attributes only temporarily associate with the entities within the scope of an individual sentence. For example:
- (coloring) *\{random\_city\_1\} is colored in red. \{random\_city\_2\} is colored in blue. \{city\} is colored in green. \{random\_city\_3\} is colored in yellow. The capital city of \{country\} is colored in green.*
- (multiple choice) *Which city is the capital city of \{country\}? A. \{random\_city\_1\} B. \{random\_city\_2\} C. \{city\} D. \{random\_city\_3} Answer: C*
### Reasoning tasks
Several question answering tasks are provided to evaluate memorization and reasoning with the facts under different scenarios. The tasks are listed by difficulty from easy to hard.
- **Eval_QA**: simple questions directly asking for the tail entity. For example:
- *What is the capital city of \{country\}? Answer: <u>\{city\}</u>*
- **Eval_multiple_choice**: choose the correct tail entity from a set of candidates. For example:
- *What is the capital city of \{country\}? A. \{choice1\} B. \{choice2\} C. \{choice3\} D. \{city\} Answer: <u>D</u>*
- **Eval_reverse**: simple questions asking for the head entity. For example:
- *Which country has \{city\} as its capital city? Answer: <u>\{country\}</u>*
- **Eval_indirect_reasoning**: questions requiring simple reasoning using the facts and commonsense knowledge of common animals. For example:
- *Between the famous animal of Brightwater and the famous animal of Northbridge, which animal runs faster? Answer: <u>the famous animal of Brightwater</u>*
- **Eval_animal_commonsense**: questions about commonsense knowledge of animals (required implicitly by the *Eval_indirect_reasoning* task, which is derived from this subset). Can be used for sanity-checking if the model has sufficient commonsense knowledge to answer the indirect reasoning tasks. For example:
- *Between zebra and turtle, which animal runs faster? Answer: <u>zebra</u>*
- **Eval_2hop_reasoning**: questions requiring 2-hop reasoning combining two facts. For example:
- *Which animal is the capital city of \{country\} famous for? Answer: <u>\{animal\}</u>*
### Citation Information
```
pending
```
提供机构:
amounts-tidings
原始信息汇总
数据集概述
数据集名称
- Country-city-animals
数据集内容
- Facts: 包含20个关于首都城市和20个关于这些城市著名动物的事实,以三元组形式提供。
- Corpora: 提供两种文本语料库,基于事实的叙述和引用。
- Corpus_narrative: 叙述文本,每项事实以10种常见叙述形式表达。
- Corpus_referencing: 引用文本,通过临时属性间接引用事实的尾部实体。
- Reasoning tasks: 提供多种问答任务,评估对事实的记忆和推理能力,难度从易到难。
- Eval_QA: 直接询问尾部实体的简单问题。
- Eval_multiple_choice: 从一组候选中选择正确的尾部实体。
- Eval_reverse: 询问头部实体的简单问题。
- Eval_indirect_reasoning: 需要使用事实和常识知识进行简单推理的问题。
- Eval_animal_commonsense: 关于动物常识的问题,用于检查模型是否具备回答间接推理任务所需的常识知识。
- Eval_2hop_reasoning: 需要结合两个事实进行2步推理的问题。
数据集配置
- Corpus_narrative:
- Features:
text(string) - Splits:
train(360 examples, 18886 bytes)
- Features:
- Corpus_referencing:
- Features:
text(string) - Splits:
train(480 examples, 64678 bytes)
- Features:
- Eval_2hop_reasoning:
- Features:
question(string),answer(string) - Splits:
test(20 examples, 1448 bytes)
- Features:
- Eval_QA:
- Features:
question(string),answer(string) - Splits:
test(40 examples, 2134 bytes)
- Features:
- Eval_animal_commonsense:
- Features:
question(string),answer(string) - Splits:
test(100 examples, 7254 bytes)
- Features:
- Eval_indirect_reasoning:
- Features:
question(string),answer(string) - Splits:
test(100 examples, 15241 bytes)
- Features:
- Eval_multiple_choice:
- Features:
question(string),choices(sequence of string),answer(int64) - Splits:
test(160 examples, 16193 bytes)
- Features:
- Eval_reverse:
- Features:
question(string),answer(string) - Splits:
test(40 examples, 2394 bytes)
- Features:
- Facts:
- Features:
head(string),relation(string),tail(string) - Splits:
train(40 examples, 1614 bytes)
- Features:
数据集用途
- 用于训练或微调语言模型,研究从文本中学习知识的能力。
- 通过不同难度的推理任务评估模型对事实的学习和泛化能力。



