WiC
收藏Opencsg2024-04-08 更新2024-06-22 收录
下载链接:
https://www.opencsg.com/datasets/OpenDataLab/WiC
下载链接
链接失效反馈官方服务:
资源简介:
“根据其上下文,一个模棱两可的词可以指代多个可能不相关的含义。主流的静态词嵌入,例如 Word2vec 和 GloVe,无法反映这种动态语义性质。上下文化词嵌入是一种解决问题的尝试这种限制是通过计算可以根据上下文适应的单词的动态表示来实现的。系统在 WiC 数据集上的任务是识别单词的预期含义。WiC 被定义为二进制分类任务。WiC 中的每个实例都有一个目标词 w,无论是动词还是名词,都提供了两个上下文。每个上下文都会触发 w 的特定含义。任务是确定 w 在两个上下文中的出现是否对应相同的含义。事实上,该数据集也可以被视为实际中词义消歧的应用。”
An ambiguous word can refer to multiple potentially unrelated meanings depending on its context. Mainstream static word embeddings, such as Word2vec and GloVe, fail to capture this dynamic semantic property. Contextualized word embeddings are an attempt to address this limitation, achieved by computing dynamic representations of words that can adapt according to their context. The task on the WiC dataset is to identify the intended meaning of a word. WiC is defined as a binary classification task. Each instance in WiC contains a target word w, which can be either a verb or a noun, paired with two contexts. Each context evokes a specific sense of w. The task is to determine whether the two occurrences of w in the two contexts correspond to the same meaning. In fact, this dataset can also be regarded as a practical application of word sense disambiguation.
创建时间:
2024-04-08
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



