five

amounts-tidings/Country-city-animals

收藏
Hugging Face2024-06-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/amounts-tidings/Country-city-animals
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 task_categories: - question-answering dataset_info: - config_name: Corpus_narrative features: - name: text dtype: string splits: - name: train num_bytes: 18886 num_examples: 360 download_size: 6208 dataset_size: 18886 - config_name: Corpus_referencing features: - name: text dtype: string splits: - name: train num_bytes: 64678 num_examples: 480 download_size: 14874 dataset_size: 64678 - config_name: Eval_2hop_reasoning features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 1448 num_examples: 20 download_size: 2142 dataset_size: 1448 - config_name: Eval_QA features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 2134 num_examples: 40 download_size: 2532 dataset_size: 2134 - config_name: Eval_animal_commonsense features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 7254 num_examples: 100 download_size: 4166 dataset_size: 7254 - config_name: Eval_indirect_reasoning features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 15241 num_examples: 100 download_size: 4966 dataset_size: 15241 - config_name: Eval_multiple_choice features: - name: question dtype: string - name: choices sequence: string - name: answer dtype: int64 splits: - name: test num_bytes: 16193 num_examples: 160 download_size: 3818 dataset_size: 16193 - config_name: Eval_reverse features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_bytes: 2394 num_examples: 40 download_size: 2631 dataset_size: 2394 - config_name: Facts features: - name: head dtype: string - name: relation dtype: string - name: tail dtype: string splits: - name: train num_bytes: 1614 num_examples: 40 download_size: 2707 dataset_size: 1614 configs: - config_name: Corpus_narrative data_files: - split: train path: Corpus_narrative/train-* - config_name: Corpus_referencing data_files: - split: train path: Corpus_referencing/train-* - config_name: Eval_2hop_reasoning data_files: - split: test path: Eval_2hop_reasoning/test-* - config_name: Eval_QA data_files: - split: test path: Eval_QA/test-* - config_name: Eval_animal_commonsense data_files: - split: test path: Eval_animal_commonsense/test-* - config_name: Eval_indirect_reasoning data_files: - split: test path: Eval_indirect_reasoning/test-* - config_name: Eval_multiple_choice data_files: - split: test path: Eval_multiple_choice/test-* - config_name: Eval_reverse data_files: - split: test path: Eval_reverse/test-* - config_name: Facts data_files: - split: train path: Facts/train-* --- # Country-city-animals: a dataset of synthetic facts, with corresponding corpora and reasoning tasks Country-city-animals is a dataset of **simple synthetic facts** about countries, cities, and animals. The facts are provided in both triplet form and in text form, and can be used to train or finetune language models for **studying knowledge learning from text**. A variety of reasoning tasks are also provided to **evaluate whether a model has learned the facts and can generalize them in reasoning tasks** from easy to difficult. - **Paper:** [Link pending] ### Facts This subset contains the facts in triplet form. All other subsets are derived from this one. - **Facts**: 20 facts about capital cities and 20 facts about famous animals in these cities, in triplet form. For example: - *(Andoria, capital_city, Copperton)* - *(Copperton, famous_for, lion)* ### Corpora Two kinds of text corpora are provided based on the facts: *Narrative* and *Referencing*. - **Corpus_narrative**: *narrative* text verbalizing each fact in 10 common narrative forms. For example: - *The capital city of \{country\} is \{city\}.* - *\{city\} is the capital of \{country\}.* - *{country\}'s capital city is \{city\}.* - **Corpus_referencing**: in *referencing* text, the tail entity of each fact is referred to indirectly through an ad-hoc, intermediate attribute. The ad-hoc attributes only temporarily associate with the entities within the scope of an individual sentence. For example: - (coloring) *\{random\_city\_1\} is colored in red. \{random\_city\_2\} is colored in blue. \{city\} is colored in green. \{random\_city\_3\} is colored in yellow. The capital city of \{country\} is colored in green.* - (multiple choice) *Which city is the capital city of \{country\}? A. \{random\_city\_1\} B. \{random\_city\_2\} C. \{city\} D. \{random\_city\_3} Answer: C* ### Reasoning tasks Several question answering tasks are provided to evaluate memorization and reasoning with the facts under different scenarios. The tasks are listed by difficulty from easy to hard. - **Eval_QA**: simple questions directly asking for the tail entity. For example: - *What is the capital city of \{country\}? Answer: <u>\{city\}</u>* - **Eval_multiple_choice**: choose the correct tail entity from a set of candidates. For example: - *What is the capital city of \{country\}? A. \{choice1\} B. \{choice2\} C. \{choice3\} D. \{city\} Answer: <u>D</u>* - **Eval_reverse**: simple questions asking for the head entity. For example: - *Which country has \{city\} as its capital city? Answer: <u>\{country\}</u>* - **Eval_indirect_reasoning**: questions requiring simple reasoning using the facts and commonsense knowledge of common animals. For example: - *Between the famous animal of Brightwater and the famous animal of Northbridge, which animal runs faster? Answer: <u>the famous animal of Brightwater</u>* - **Eval_animal_commonsense**: questions about commonsense knowledge of animals (required implicitly by the *Eval_indirect_reasoning* task, which is derived from this subset). Can be used for sanity-checking if the model has sufficient commonsense knowledge to answer the indirect reasoning tasks. For example: - *Between zebra and turtle, which animal runs faster? Answer: <u>zebra</u>* - **Eval_2hop_reasoning**: questions requiring 2-hop reasoning combining two facts. For example: - *Which animal is the capital city of \{country\} famous for? Answer: <u>\{animal\}</u>* ### Citation Information ``` pending ```
提供机构:
amounts-tidings
原始信息汇总

数据集概述

数据集名称

  • Country-city-animals

数据集内容

  • Facts: 包含20个关于首都城市和20个关于这些城市著名动物的事实,以三元组形式提供。
  • Corpora: 提供两种文本语料库,基于事实的叙述和引用。
    • Corpus_narrative: 叙述文本,每项事实以10种常见叙述形式表达。
    • Corpus_referencing: 引用文本,通过临时属性间接引用事实的尾部实体。
  • Reasoning tasks: 提供多种问答任务,评估对事实的记忆和推理能力,难度从易到难。
    • Eval_QA: 直接询问尾部实体的简单问题。
    • Eval_multiple_choice: 从一组候选中选择正确的尾部实体。
    • Eval_reverse: 询问头部实体的简单问题。
    • Eval_indirect_reasoning: 需要使用事实和常识知识进行简单推理的问题。
    • Eval_animal_commonsense: 关于动物常识的问题,用于检查模型是否具备回答间接推理任务所需的常识知识。
    • Eval_2hop_reasoning: 需要结合两个事实进行2步推理的问题。

数据集配置

  • Corpus_narrative:
    • Features: text (string)
    • Splits: train (360 examples, 18886 bytes)
  • Corpus_referencing:
    • Features: text (string)
    • Splits: train (480 examples, 64678 bytes)
  • Eval_2hop_reasoning:
    • Features: question (string), answer (string)
    • Splits: test (20 examples, 1448 bytes)
  • Eval_QA:
    • Features: question (string), answer (string)
    • Splits: test (40 examples, 2134 bytes)
  • Eval_animal_commonsense:
    • Features: question (string), answer (string)
    • Splits: test (100 examples, 7254 bytes)
  • Eval_indirect_reasoning:
    • Features: question (string), answer (string)
    • Splits: test (100 examples, 15241 bytes)
  • Eval_multiple_choice:
    • Features: question (string), choices (sequence of string), answer (int64)
    • Splits: test (160 examples, 16193 bytes)
  • Eval_reverse:
    • Features: question (string), answer (string)
    • Splits: test (40 examples, 2394 bytes)
  • Facts:
    • Features: head (string), relation (string), tail (string)
    • Splits: train (40 examples, 1614 bytes)

数据集用途

  • 用于训练或微调语言模型,研究从文本中学习知识的能力。
  • 通过不同难度的推理任务评估模型对事实的学习和泛化能力。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作