Joanne/Metaphors_and_Analogies

Name: Joanne/Metaphors_and_Analogies
Creator: Joanne
Published: 2023-05-30 20:40:56
License: 暂无描述

Hugging Face2023-05-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Joanne/Metaphors_and_Analogies

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - question-answering - token-classification language: - en --- # Metaphors and analogies datasets These datasets contain word pairs and quadruples forming analogies, metaphoric mapping or sematically unacceptable compositions. - Pair instances are pairs of nouns A and B in a sentence of the form "A is a B". - Quadruple instances are of the form : < (A,B),(C,D) > There is an analogy when A is to B what C is to D. The analogy is also a metaphor when the (A,B) and (C,D) form a metaphoric mapping, usually when they come from different domains. ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** Language : English ### Datasets and paper links | Name | Size | Labels | Description | | ---------: | :----- |:-------- | :-------------------------------------------------------------------------- | | `Cardillo` | 260 *2 | 1, 2 | Pairs of "A is-a B" sentences composed of one metaphoric and one literal sentence. The two sentences of a given pair share the same B term. | | `Jankowiak`| 120*3 | 0, 1, 2 | Triples of "A is-a/is-like-a B" sentences with exactly one literal, one semantic abnormal and one metaphoric sentence. | | `Green` | 40*3 | 0, 1, 2 | Triples of proportional analogies, made of 4 terms <A, B, Ci, Di> each. One stem <A,B> is composed with 3 different <Ci,Di> pairs, to form exaclty one near analogy, one far analogy and one non analogic quadruple| | `Kmiecik` | 720 | 0, 1, 2 | Quadruples <A,B,C,D> labelled as analogy:True/False and far_analogy: True/False| | `SAT-met` | 160?*5 | 0, 1, 2, 12 | One pair stem <A,B> to combine with 5 different pairs <Ci,Di> and attempt to form proportional analogies. Only one <Ci,Di> forms an analogy with <A,B> We additionally labelled the analogies as **metaphoric**:True/False| | Name | Paper Citation | Paper link | Dataset link | | ---------: | :------- | :------------------------------ |-----------------------------------------: | | `Cardillo` | | [Cardillo (2010)](https://link.springer.com/article/10.3758/s13428-016-0717-1) [Cardillo (2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2952404/ ) | | | `Jankowiak`| | [Jankowiak (2020)]( https://link-springer-com.abc.cardiff.ac.uk/article/10.1007/s10936-020-09695-7) | | | `Green` | Green, A. E., Kraemer, D. J. M., Fugelsang, J., Gray, J. R., & Dunbar, K. (2010). Connecting Long Distance: Semantic Distance in Analogical Reasoning Modulates Frontopolar Cortex Activity. Cerebral Cortex, 10, 70-76. | [Green (20)]() || | `Kmiecik` |Kmiecik, M. J., Brisson, R. J., & Morrison, R. G. (2019). The time course of semantic and relational processing during verbal analogical reasoning. Brain and Cognition, 129, 25-34. | [Kmiecik (20)]() || | `SAT-met` | | [Turney (2005)](https://arxiv.org/pdf/cs/0508053.pdf) | | ### Labels : - Pairs - **0** : anomaly - **1** : literal - **2** : metaphor - Quadruples : - **0** : not an analogy - **1** : an analogy but not a metaphor - **2** : an analogy and a metaphor or a far analogy - **12** : maybe a metaphor, somewhere between 1 and 2 ### Dataset Splits - Both lexical and random splits are available for classification experiments. - Size of the splits : - **train** : 50 % - **validation** : 10 % - **test** : 40 % - Additionally, for all datasets, the `5-folds` field gives frozen splits for a five-folds cross validation experiment with train/val/test = 70/10/20% of the sets. # Datasets for Classification - Task : binary classification or 3-classes classification of pairs or quadruples. Each pair or quadruple is to classify between anomaly, non-metaphoric and metaphoric. ## Pairs ### Datasets names & splits : | Original set | Dataset name | Split | |-------------:| :------------ | :------ | | Cardillo | Pairs\_Cardillo\_random_split | random | | | Pairs\_Cardillo\_lexical_split | lexical | | Jankowiac | Pairs\_Jankowiac\_random_split | random | | | Pairs\_Jankowiac\_lexical_split | lexical | ### Data fields : | Field | Description | Type | | -------------:| :------------ | ---- | | corpus | name of the orgiginal dataset | str | | id | instance id | str | | set_id | id of the set containing the given instance in the multiple choice task | int | | label | 0, 1, 2 | int | | sentence | A is-a B sentence. | str | | A | A expression in the sentence | str | | B | B expression in the sentence | str | | A\_position | position of A in the sentence | list(int) | | B\_position | position of B in the sentence | list(int) | | 5-folds | frozen splits for cross validation | list(str) | ### Examples : | Name | Example | Label| | -------: | :------------------------------------- | :-------- | |Cardillo | | | |Jankowiac | | | ## Quadruples ### Datasets names & splits | Original set | dataset name | Split | | -------: | :------------------------------------- | :-------- | |Green | Quadruples\_Green\_random_split | random | | | Quadruples\_Green\_lexical_split | lexical | |Kmiecik | Quadruples\_Kmiecik\_random_split | random | | | Quadruples\_Kmiecik\_lexical\_split\_on\_AB | lexical AB | | | Quadruples\_Kmiecik\_lexical_split\_on\_CD | lexical CD | |SAT | Quadruples\_SAT\_random\_split | random | random | | | Quadruples\_SAT\_lexical\_split | lexical | lexical | ### Data fields : | Field| Description | Type | | -------------: | :------------ | :------------ | | corpus | Name of the orgiginal dataset | str | | id | Element id | str | | set\_id | Id of the set containing the given instance in the multiple-choice task datasets | int | | label | 0, 1, 2, 12 | int | | AB | pair of terms | list(str) | | CD | pair of terms | list(str) | | 5-folds | frozen splits for cross validation | list(str) | ### Examples : | Name | Example | Label| |-------: | :------------------------------------- | :-------- | |Green | | | |Kmiecik | | | | SAT | | | # Datasets for multiple choice questions or permutation - Task : One stem and multiple choices. The stem and its possible combinations are to be combined to form a sentence. The resulting sentence has a label <0,1,2>. ## Pairs ### Datasets names & splits : | Original set | dataset name | Split | | -----------|------| :---- | | Cardillo | Pairs\_Cardillo\_set | test only | | Jankowiac | Pairs\_Jankowiac\_set |test only | ### Data fields : | Field | Description | Type | | -------------: | :------------ | :------------ | | corpus | Name of the orgiginal dataset | str | | id | Element id | str | | pair_ids | Ids of each pair as appearing in the classification datasets. | list(str) | | labels | 0, 1, 2 | list(int) | | sentences | List of the sentences composing the set | list(str) | | A\_positions | Positions of the A's in each sentence | list(list(int)) | | B\_positions | Positions of the B's in each sentence | list(list(int)) | | answer | Index of the metaphor | int | | stem | Term shared between the sentences of the set. | str | | 5-folds | frozen splits for cross validation | list(str) | ### Examples : | Name | Stem | Sentences |Label| |-------: |-------: | :------------------------------------- | :-------- | |Cardillo | comet | The astronomer's obssession was a comet. | 1 | | | | The politician's career was a comet. | 2 | | Jankoviac | harbour | This banana is like a harbour | 0 | | | | A house is a harbour | 2| | | | This area is a harbour | 1 | ## Quadruples ### Datasets names & splits : | Original set | dataset name | Split | | ----------: | :------| :---- | | Green | Quadruples\_Green\_set | test only | | SAT | Quadruples\_SAT\_met_set | test only | ### Data fields : | Field | Description | Type | |-------------: | :------------ | :------------ | | corpus | name of the orgiginal dataset | str | | id | Element id | str | | pair\_ids | Ids of the instances as appearing in the clasification datasets | list(str) | | labels | 0, 1, 2, 12 | list(int) | | answer | temp | int | | stem | Word pair to compose with all the other pairs of the set | list(str) | | pairs | List of word pairs | list(list(str)) | | 5-folds | Frozen splits for cross validation | list(str) | ### Examples : | Name | Example | Label| |-------: | :------------------------------------- | :-------- | |Green | | | | | | | | SAT | | |

提供机构：

Joanne

原始信息汇总

隐喻和类比数据集

这些数据集包含形成类比、隐喻映射或语义不可接受组合的词对和四元组。

词对实例：形式为“A is a B”的句子中的名词 A 和 B 对。
四元组实例：形式为 <(A,B),(C,D)>，当 A 对 B 的关系与 C 对 D 的关系相同时，存在类比。类比也是隐喻，当 (A,B) 和 (C,D) 形成隐喻映射时，通常来自不同领域。

数据集描述

数据集和论文链接

名称	大小	标签	描述
`Cardillo`	260 *2	1, 2	由一个隐喻句和一个字面句组成的“A is-a B”句子对，两个句子共享相同的 B 词。
`Jankowiak`	120*3	0, 1, 2	“A is-a/is-like-a B”句子的三元组，包含一个字面句、一个语义异常句和一个隐喻句。
`Green`	40*3	0, 1, 2	比例类比的四元组，由 4 个词 <A, B, Ci, Di> 组成。一个主干 <A,B> 与 3 个不同的 <Ci,Di> 对组合，形成一个近类比、一个远类比和一个非类比四元组。
`Kmiecik`	720	0, 1, 2	四元组 <A,B,C,D>，标记为类比：True/False 和远类比：True/False。
`SAT-met`	160?*5	0, 1, 2, 12	一个主干对 <A,B> 与 5 个不同的对 <Ci,Di> 组合，尝试形成比例类比。只有一个 <Ci,Di> 与 <A,B> 形成类比。我们还标记了类比为隐喻：True/False。

数据集拆分

分类实验可用的词汇和随机拆分。
- 拆分大小：
  - 训练 : 50 %
  - 验证 : 10 %
  - 测试 : 40 %
此外，所有数据集的 5-folds 字段提供冻结拆分，用于五折交叉验证实验，训练/验证/测试 = 70/10/20%。

分类数据集

任务：词对或四元组的二分类或三分类。每个词对或四元组分类为异常、非隐喻和隐喻。

词对

数据集名称和拆分

原始集	数据集名称	拆分
Cardillo	Pairs_Cardillo_random_split	random
	Pairs_Cardillo_lexical_split	lexical
Jankowiac	Pairs_Jankowiac_random_split	random
	Pairs_Jankowiac_lexical_split	lexical

数据字段

字段	描述	类型
corpus	原始数据集名称	str
id	实例 ID	str
set_id	多选任务中包含给定实例的集合 ID	int
label	0, 1, 2	int
sentence	A is-a B 句子	str
A	句子中的 A 表达式	str
B	句子中的 B 表达式	str
A_position	句子中 A 的位置	list(int)
B_position	句子中 B 的位置	list(int)
5-folds	交叉验证的冻结拆分	list(str)

四元组

数据集名称和拆分

原始集	数据集名称	拆分
Green	Quadruples_Green_random_split	random
	Quadruples_Green_lexical_split	lexical
Kmiecik	Quadruples_Kmiecik_random_split	random
	Quadruples_Kmiecik_lexical_split_on_AB	lexical AB
	Quadruples_Kmiecik_lexical_split_on_CD	lexical CD
SAT	Quadruples_SAT_random_split	random
	Quadruples_SAT_lexical_split	lexical

数据字段

字段	描述	类型
corpus	原始数据集名称	str
id	元素 ID	str
set_id	多选任务数据集中包含给定实例的集合 ID	int
label	0, 1, 2, 12	int
AB	词对	list(str)
CD	词对	list(str)
5-folds	交叉验证的冻结拆分	list(str)

多选题或排列数据集

任务：一个主干和多个选项。主干及其可能的组合要组合成一个句子。生成的句子有一个标签 <0,1,2>。

词对

数据集名称和拆分

原始集	数据集名称	拆分
Cardillo	Pairs_Cardillo_set	test only
Jankowiac	Pairs_Jankowiac_set	test only

数据字段

字段	描述	类型
corpus	原始数据集名称	str
id	元素 ID	str
pair_ids	分类数据集中每个对的 ID	list(str)
labels	0, 1, 2	list(int)
sentences	组成集合的句子列表	list(str)
A_positions	每个句子中 A 的位置	list(list(int))
B_positions	每个句子中 B 的位置	list(list(int))
answer	隐喻的索引	int
stem	集合中句子共享的词	str
5-folds	交叉验证的冻结拆分	list(str)

四元组

数据集名称和拆分

原始集	数据集名称	拆分
Green	Quadruples_Green_set	test only
SAT	Quadruples_SAT_met_set	test only

数据字段

字段	描述	类型
corpus	原始数据集名称	str
id	元素 ID	str
pair_ids	分类数据集中实例的 ID	list(str)
labels	0, 1, 2, 12	list(int)
answer	临时	int
stem	与集合中所有其他对组合的词对	list(str)
pairs	词对列表	list(list(str))
5-folds	交叉验证的冻结拆分	list(str)

5,000+

优质数据集

54 个

任务类型

进入经典数据集

Joanne/Metaphors_and_Analogies

隐喻和类比数据集

数据集描述

数据集和论文链接

标签

数据集拆分

分类数据集

词对

数据集名称和拆分

数据字段

四元组

数据集名称和拆分

数据字段

多选题或排列数据集

词对

数据集名称和拆分

数据字段

四元组

数据集名称和拆分

数据字段