amounts-tidings/Country-city-animals

Name: amounts-tidings/Country-city-animals
Creator: amounts-tidings
Published: 2024-06-03 02:11:28
License: 暂无描述

Hugging Face2024-06-03 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/amounts-tidings/Country-city-animals

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 task_categories: - question-answering dataset_info: - config_name: Corpus_narrative features: - name: text dtype: string splits: - name: train num_bytes: 18886 num_examples: 360 download_size: 6208 dataset_size: 18886 - config_name: Corpus_referencing features: - name: text dtype: string splits: - name: train num_bytes: 64678 num_examples: 480 download_size: 14874 dataset_size: 64678 - config_name: Eval_2hop_reasoning features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 1448 num_examples: 20 download_size: 2142 dataset_size: 1448 - config_name: Eval_QA features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 2134 num_examples: 40 download_size: 2532 dataset_size: 2134 - config_name: Eval_animal_commonsense features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 7254 num_examples: 100 download_size: 4166 dataset_size: 7254 - config_name: Eval_indirect_reasoning features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 15241 num_examples: 100 download_size: 4966 dataset_size: 15241 - config_name: Eval_multiple_choice features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 splits: - name: test num_bytes: 16193 num_examples: 160 download_size: 3818 dataset_size: 16193 - config_name: Eval_reverse features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 2394 num_examples: 40 download_size: 2631 dataset_size: 2394 - config_name: Facts features: - name: head dtype: string - name: relation dtype: string - name: tail dtype: string splits: - name: train num_bytes: 1614 num_examples: 40 download_size: 2707 dataset_size: 1614 configs: - config_name: Corpus_narrative data_files: - split: train path: Corpus_narrative/train-* - config_name: Corpus_referencing data_files: - split: train path: Corpus_referencing/train-* - config_name: Eval_2hop_reasoning data_files: - split: test path: Eval_2hop_reasoning/test-* - config_name: Eval_QA data_files: - split: test path: Eval_QA/test-* - config_name: Eval_animal_commonsense data_files: - split: test path: Eval_animal_commonsense/test-* - config_name: Eval_indirect_reasoning data_files: - split: test path: Eval_indirect_reasoning/test-* - config_name: Eval_multiple_choice data_files: - split: test path: Eval_multiple_choice/test-* - config_name: Eval_reverse data_files: - split: test path: Eval_reverse/test-* - config_name: Facts data_files: - split: train path: Facts/train-* --- # Country-city-animals: a dataset of synthetic facts, with corresponding corpora and reasoning tasks Country-city-animals is a dataset of **simple synthetic facts** about countries, cities, and animals. The facts are provided in both triplet form and in text form, and can be used to train or finetune language models for **studying knowledge learning from text**. A variety of reasoning tasks are also provided to **evaluate whether a model has learned the facts and can generalize them in reasoning tasks** from easy to difficult. - **Paper:** [Link pending] ### Facts This subset contains the facts in triplet form. All other subsets are derived from this one. - **Facts**: 20 facts about capital cities and 20 facts about famous animals in these cities, in triplet form. For example: - *(Andoria, capital_city, Copperton)* - *(Copperton, famous_for, lion)* ### Corpora Two kinds of text corpora are provided based on the facts: *Narrative* and *Referencing*. - **Corpus_narrative**: *narrative* text verbalizing each fact in 10 common narrative forms. For example: - *The capital city of \{country\} is \{city\}.* - *\{city\} is the capital of \{country\}.* - *{country\}'s capital city is \{city\}.* - **Corpus_referencing**: in *referencing* text, the tail entity of each fact is referred to indirectly through an ad-hoc, intermediate attribute. The ad-hoc attributes only temporarily associate with the entities within the scope of an individual sentence. For example: - (coloring) *\{random\_city\_1\} is colored in red. \{random\_city\_2\} is colored in blue. \{city\} is colored in green. \{random\_city\_3\} is colored in yellow. The capital city of \{country\} is colored in green.* - (multiple choice) *Which city is the capital city of \{country\}? A. \{random\_city\_1\} B. \{random\_city\_2\} C. \{city\} D. \{random\_city\_3} Answer: C* ### Reasoning tasks Several question answering tasks are provided to evaluate memorization and reasoning with the facts under different scenarios. The tasks are listed by difficulty from easy to hard. - **Eval_QA**: simple questions directly asking for the tail entity. For example: - *What is the capital city of \{country\}? Answer: \{city\}* - **Eval_multiple_choice**: choose the correct tail entity from a set of candidates. For example: - *What is the capital city of \{country\}? A. \{choice1\} B. \{choice2\} C. \{choice3\} D. \{city\} Answer: D* - **Eval_reverse**: simple questions asking for the head entity. For example: - *Which country has \{city\} as its capital city? Answer: \{country\}* - **Eval_indirect_reasoning**: questions requiring simple reasoning using the facts and commonsense knowledge of common animals. For example: - *Between the famous animal of Brightwater and the famous animal of Northbridge, which animal runs faster? Answer: the famous animal of Brightwater* - **Eval_animal_commonsense**: questions about commonsense knowledge of animals (required implicitly by the *Eval_indirect_reasoning* task, which is derived from this subset). Can be used for sanity-checking if the model has sufficient commonsense knowledge to answer the indirect reasoning tasks. For example: - *Between zebra and turtle, which animal runs faster? Answer: zebra* - **Eval_2hop_reasoning**: questions requiring 2-hop reasoning combining two facts. For example: - *Which animal is the capital city of \{country\} famous for? Answer: \{animal\}* ### Citation Information ``` pending ```

提供机构：

amounts-tidings

原始信息汇总

数据集概述

数据集名称

Country-city-animals

数据集内容

Facts: 包含20个关于首都城市和20个关于这些城市著名动物的事实，以三元组形式提供。
Corpora: 提供两种文本语料库，基于事实的叙述和引用。
- Corpus_narrative: 叙述文本，每项事实以10种常见叙述形式表达。
- Corpus_referencing: 引用文本，通过临时属性间接引用事实的尾部实体。
Reasoning tasks: 提供多种问答任务，评估对事实的记忆和推理能力，难度从易到难。
- Eval_QA: 直接询问尾部实体的简单问题。
- Eval_multiple_choice: 从一组候选中选择正确的尾部实体。
- Eval_reverse: 询问头部实体的简单问题。
- Eval_indirect_reasoning: 需要使用事实和常识知识进行简单推理的问题。
- Eval_animal_commonsense: 关于动物常识的问题，用于检查模型是否具备回答间接推理任务所需的常识知识。
- Eval_2hop_reasoning: 需要结合两个事实进行2步推理的问题。

数据集配置

Corpus_narrative:
- Features: text (string)
- Splits: train (360 examples, 18886 bytes)
Corpus_referencing:
- Features: text (string)
- Splits: train (480 examples, 64678 bytes)
Eval_2hop_reasoning:
- Features: question (string), answer (string)
- Splits: test (20 examples, 1448 bytes)
Eval_QA:
- Features: question (string), answer (string)
- Splits: test (40 examples, 2134 bytes)
Eval_animal_commonsense:
- Features: question (string), answer (string)
- Splits: test (100 examples, 7254 bytes)
Eval_indirect_reasoning:
- Features: question (string), answer (string)
- Splits: test (100 examples, 15241 bytes)
Eval_multiple_choice:
- Features: question (string), choices (sequence of string), answer (int64)
- Splits: test (160 examples, 16193 bytes)
Eval_reverse:
- Features: question (string), answer (string)
- Splits: test (40 examples, 2394 bytes)
Facts:
- Features: head (string), relation (string), tail (string)
- Splits: train (40 examples, 1614 bytes)

数据集用途

用于训练或微调语言模型，研究从文本中学习知识的能力。
通过不同难度的推理任务评估模型对事实的学习和泛化能力。

5,000+

优质数据集

54 个

任务类型

进入经典数据集