five

Joanne/Metaphors_and_Analogies

收藏
Hugging Face2023-05-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Joanne/Metaphors_and_Analogies
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - question-answering - token-classification language: - en --- # Metaphors and analogies datasets These datasets contain word pairs and quadruples forming analogies, metaphoric mapping or sematically unacceptable compositions. - Pair instances are pairs of nouns A and B in a sentence of the form "A is a B". - Quadruple instances are of the form : < (A,B),(C,D) > There is an analogy when A is to B what C is to D. The analogy is also a metaphor when the (A,B) and (C,D) form a metaphoric mapping, usually when they come from different domains. ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** Language : English ### Datasets and paper links | Name | Size | Labels | Description | | ---------: | :----- |:-------- | :-------------------------------------------------------------------------- | | `Cardillo` | 260 *2 | 1, 2 | Pairs of "A is-a B" sentences composed of one metaphoric and one literal sentence. The two sentences of a given pair share the same B term. | | `Jankowiak`| 120*3 | 0, 1, 2 | Triples of "A is-a/is-like-a B" sentences with exactly one literal, one semantic abnormal and one metaphoric sentence. | | `Green` | 40*3 | 0, 1, 2 | Triples of proportional analogies, made of 4 terms <A, B, Ci, Di> each. One stem <A,B> is composed with 3 different <Ci,Di> pairs, to form exaclty one near analogy, one far analogy and one non analogic quadruple| | `Kmiecik` | 720 | 0, 1, 2 | Quadruples <A,B,C,D> labelled as analogy:True/False and far_analogy: True/False| | `SAT-met` | 160?*5 | 0, 1, 2, 12 | One pair stem <A,B> to combine with 5 different pairs <Ci,Di> and attempt to form proportional analogies. Only one <Ci,Di> forms an analogy with <A,B> We additionally labelled the analogies as **metaphoric**:True/False| | Name | Paper Citation | Paper link | Dataset link | | ---------: | :------- | :------------------------------ |-----------------------------------------: | | `Cardillo` | | [Cardillo (2010)](https://link.springer.com/article/10.3758/s13428-016-0717-1) [Cardillo (2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2952404/ ) | | | `Jankowiak`| | [Jankowiak (2020)]( https://link-springer-com.abc.cardiff.ac.uk/article/10.1007/s10936-020-09695-7) | | | `Green` | Green, A. E., Kraemer, D. J. M., Fugelsang, J., Gray, J. R., & Dunbar, K. (2010). Connecting Long Distance: Semantic Distance in Analogical Reasoning Modulates Frontopolar Cortex Activity. Cerebral Cortex, 10, 70-76. | [Green (20)]() || | `Kmiecik` |Kmiecik, M. J., Brisson, R. J., & Morrison, R. G. (2019). The time course of semantic and relational processing during verbal analogical reasoning. Brain and Cognition, 129, 25-34. | [Kmiecik (20)]() || | `SAT-met` | | [Turney (2005)](https://arxiv.org/pdf/cs/0508053.pdf) | | ### Labels : - Pairs - **0** : anomaly - **1** : literal - **2** : metaphor - Quadruples : - **0** : not an analogy - **1** : an analogy but not a metaphor - **2** : an analogy and a metaphor or a far analogy - **12** : maybe a metaphor, somewhere between 1 and 2 ### Dataset Splits - Both lexical and random splits are available for classification experiments. - Size of the splits : - **train** : 50 % - **validation** : 10 % - **test** : 40 % - Additionally, for all datasets, the `5-folds` field gives frozen splits for a five-folds cross validation experiment with train/val/test = 70/10/20% of the sets. # Datasets for Classification - Task : binary classification or 3-classes classification of pairs or quadruples. Each pair or quadruple is to classify between anomaly, non-metaphoric and metaphoric. ## Pairs ### Datasets names & splits : | Original set | Dataset name | Split | |-------------:| :------------ | :------ | | Cardillo | Pairs\_Cardillo\_random_split | random | | | Pairs\_Cardillo\_lexical_split | lexical | | Jankowiac | Pairs\_Jankowiac\_random_split | random | | | Pairs\_Jankowiac\_lexical_split | lexical | ### Data fields : | Field | Description | Type | | -------------:| :------------ | ---- | | corpus | name of the orgiginal dataset | str | | id | instance id | str | | set_id | id of the set containing the given instance in the multiple choice task | int | | label | 0, 1, 2 | int | | sentence | A is-a B sentence. | str | | A | A expression in the sentence | str | | B | B expression in the sentence | str | | A\_position | position of A in the sentence | list(int) | | B\_position | position of B in the sentence | list(int) | | 5-folds | frozen splits for cross validation | list(str) | ### Examples : | Name | Example | Label| | -------: | :------------------------------------- | :-------- | |Cardillo | | | |Jankowiac | | | ## Quadruples ### Datasets names & splits | Original set | dataset name | Split | | -------: | :------------------------------------- | :-------- | |Green | Quadruples\_Green\_random_split | random | | | Quadruples\_Green\_lexical_split | lexical | |Kmiecik | Quadruples\_Kmiecik\_random_split | random | | | Quadruples\_Kmiecik\_lexical\_split\_on\_AB | lexical AB | | | Quadruples\_Kmiecik\_lexical_split\_on\_CD | lexical CD | |SAT | Quadruples\_SAT\_random\_split | random | random | | | Quadruples\_SAT\_lexical\_split | lexical | lexical | ### Data fields : | Field| Description | Type | | -------------: | :------------ | :------------ | | corpus | Name of the orgiginal dataset | str | | id | Element id | str | | set\_id | Id of the set containing the given instance in the multiple-choice task datasets | int | | label | 0, 1, 2, 12 | int | | AB | pair of terms | list(str) | | CD | pair of terms | list(str) | | 5-folds | frozen splits for cross validation | list(str) | ### Examples : | Name | Example | Label| |-------: | :------------------------------------- | :-------- | |Green | | | |Kmiecik | | | | SAT | | | # Datasets for multiple choice questions or permutation - Task : One stem and multiple choices. The stem and its possible combinations are to be combined to form a sentence. The resulting sentence has a label <0,1,2>. ## Pairs ### Datasets names & splits : | Original set | dataset name | Split | | -----------|------| :---- | | Cardillo | Pairs\_Cardillo\_set | test only | | Jankowiac | Pairs\_Jankowiac\_set |test only | ### Data fields : | Field | Description | Type | | -------------: | :------------ | :------------ | | corpus | Name of the orgiginal dataset | str | | id | Element id | str | | pair_ids | Ids of each pair as appearing in the classification datasets. | list(str) | | labels | 0, 1, 2 | list(int) | | sentences | List of the sentences composing the set | list(str) | | A\_positions | Positions of the A's in each sentence | list(list(int)) | | B\_positions | Positions of the B's in each sentence | list(list(int)) | | answer | Index of the metaphor | int | | stem | Term shared between the sentences of the set. | str | | 5-folds | frozen splits for cross validation | list(str) | ### Examples : | Name | Stem | Sentences |Label| |-------: |-------: | :------------------------------------- | :-------- | |Cardillo | comet | The astronomer's obssession was a comet. | 1 | | | | The politician's career was a comet. | 2 | | Jankoviac | harbour | This banana is like a harbour | 0 | | | | A house is a harbour | 2| | | | This area is a harbour | 1 | ## Quadruples ### Datasets names & splits : | Original set | dataset name | Split | | ----------: | :------| :---- | | Green | Quadruples\_Green\_set | test only | | SAT | Quadruples\_SAT\_met_set | test only | ### Data fields : | Field | Description | Type | |-------------: | :------------ | :------------ | | corpus | name of the orgiginal dataset | str | | id | Element id | str | | pair\_ids | Ids of the instances as appearing in the clasification datasets | list(str) | | labels | 0, 1, 2, 12 | list(int) | | answer | temp | int | | stem | Word pair to compose with all the other pairs of the set | list(str) | | pairs | List of word pairs | list(list(str)) | | 5-folds | Frozen splits for cross validation | list(str) | ### Examples : | Name | Example | Label| |-------: | :------------------------------------- | :-------- | |Green | | | | | | | | SAT | | |
提供机构:
Joanne
原始信息汇总

隐喻和类比数据集

这些数据集包含形成类比、隐喻映射或语义不可接受组合的词对和四元组。

  • 词对实例:形式为“A is a B”的句子中的名词 A 和 B 对。
  • 四元组实例:形式为 <(A,B),(C,D)>,当 A 对 B 的关系与 C 对 D 的关系相同时,存在类比。类比也是隐喻,当 (A,B) 和 (C,D) 形成隐喻映射时,通常来自不同领域。

数据集描述

数据集和论文链接

名称 大小 标签 描述
Cardillo 260 *2 1, 2 由一个隐喻句和一个字面句组成的“A is-a B”句子对,两个句子共享相同的 B 词。
Jankowiak 120*3 0, 1, 2 “A is-a/is-like-a B”句子的三元组,包含一个字面句、一个语义异常句和一个隐喻句。
Green 40*3 0, 1, 2 比例类比的四元组,由 4 个词 <A, B, Ci, Di> 组成。一个主干 <A,B> 与 3 个不同的 <Ci,Di> 对组合,形成一个近类比、一个远类比和一个非类比四元组。
Kmiecik 720 0, 1, 2 四元组 <A,B,C,D>,标记为类比:True/False 和远类比:True/False。
SAT-met 160?*5 0, 1, 2, 12 一个主干对 <A,B> 与 5 个不同的对 <Ci,Di> 组合,尝试形成比例类比。只有一个 <Ci,Di> 与 <A,B> 形成类比。我们还标记了类比为 隐喻:True/False。

标签

  • 词对

    • 0 : 异常
    • 1 : 字面
    • 2 : 隐喻
  • 四元组

    • 0 : 非类比
    • 1 : 类比但非隐喻
    • 2 : 类比和隐喻或远类比
    • 12 : 可能是隐喻,介于 1 和 2 之间

数据集拆分

  • 分类实验可用的词汇和随机拆分。

    • 拆分大小:
      • 训练 : 50 %
      • 验证 : 10 %
      • 测试 : 40 %
  • 此外,所有数据集的 5-folds 字段提供冻结拆分,用于五折交叉验证实验,训练/验证/测试 = 70/10/20%。

分类数据集

  • 任务:词对或四元组的二分类或三分类。每个词对或四元组分类为异常、非隐喻和隐喻。

词对

数据集名称和拆分

原始集 数据集名称 拆分
Cardillo Pairs_Cardillo_random_split random
Pairs_Cardillo_lexical_split lexical
Jankowiac Pairs_Jankowiac_random_split random
Pairs_Jankowiac_lexical_split lexical

数据字段

字段 描述 类型
corpus 原始数据集名称 str
id 实例 ID str
set_id 多选任务中包含给定实例的集合 ID int
label 0, 1, 2 int
sentence A is-a B 句子 str
A 句子中的 A 表达式 str
B 句子中的 B 表达式 str
A_position 句子中 A 的位置 list(int)
B_position 句子中 B 的位置 list(int)
5-folds 交叉验证的冻结拆分 list(str)

四元组

数据集名称和拆分

原始集 数据集名称 拆分
Green Quadruples_Green_random_split random
Quadruples_Green_lexical_split lexical
Kmiecik Quadruples_Kmiecik_random_split random
Quadruples_Kmiecik_lexical_split_on_AB lexical AB
Quadruples_Kmiecik_lexical_split_on_CD lexical CD
SAT Quadruples_SAT_random_split random
Quadruples_SAT_lexical_split lexical

数据字段

字段 描述 类型
corpus 原始数据集名称 str
id 元素 ID str
set_id 多选任务数据集中包含给定实例的集合 ID int
label 0, 1, 2, 12 int
AB 词对 list(str)
CD 词对 list(str)
5-folds 交叉验证的冻结拆分 list(str)

多选题或排列数据集

  • 任务:一个主干和多个选项。主干及其可能的组合要组合成一个句子。生成的句子有一个标签 <0,1,2>。

词对

数据集名称和拆分

原始集 数据集名称 拆分
Cardillo Pairs_Cardillo_set test only
Jankowiac Pairs_Jankowiac_set test only

数据字段

字段 描述 类型
corpus 原始数据集名称 str
id 元素 ID str
pair_ids 分类数据集中每个对的 ID list(str)
labels 0, 1, 2 list(int)
sentences 组成集合的句子列表 list(str)
A_positions 每个句子中 A 的位置 list(list(int))
B_positions 每个句子中 B 的位置 list(list(int))
answer 隐喻的索引 int
stem 集合中句子共享的词 str
5-folds 交叉验证的冻结拆分 list(str)

四元组

数据集名称和拆分

原始集 数据集名称 拆分
Green Quadruples_Green_set test only
SAT Quadruples_SAT_met_set test only

数据字段

字段 描述 类型
corpus 原始数据集名称 str
id 元素 ID str
pair_ids 分类数据集中实例的 ID list(str)
labels 0, 1, 2, 12 list(int)
answer 临时 int
stem 与集合中所有其他对组合的词对 list(str)
pairs 词对列表 list(list(str))
5-folds 交叉验证的冻结拆分 list(str)
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作