WiC (Words in Context)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/WiC
下载链接
链接失效反馈官方服务:
资源简介:
“根据其上下文,一个模棱两可的词可以指代多个可能不相关的含义。主流的静态词嵌入,例如 Word2vec 和 GloVe,无法反映这种动态语义性质。上下文化词嵌入是一种解决问题的尝试这种限制是通过计算可以根据上下文适应的单词的动态表示来实现的。系统在 WiC 数据集上的任务是识别单词的预期含义。WiC 被定义为二进制分类任务。WiC 中的每个实例都有一个目标词 w,无论是动词还是名词,都提供了两个上下文。每个上下文都会触发 w 的特定含义。任务是确定 w 在两个上下文中的出现是否对应相同的含义。事实上,该数据集也可以被视为实际中词义消歧的应用。”
Depending on its context, an ambiguous word can refer to multiple potentially unrelated meanings. Mainstream static word embeddings, such as Word2vec and GloVe, fail to capture this dynamic semantic property. Contextualized word embeddings are an attempt to address this limitation, which is achieved by computing dynamic representations of words that can adapt according to their context. The task on the WiC dataset is to identify the intended meaning of a word. WiC is defined as a binary classification task. Each instance in WiC contains a target word w, which can be either a verb or a noun, paired with two contexts. Each context evokes a specific meaning of w. The task is to determine whether the occurrences of w in the two contexts correspond to the same meaning. In fact, this dataset can also be regarded as a practical application of word sense disambiguation.
提供机构:
OpenDataLab
创建时间:
2022-04-28
搜集汇总
数据集介绍

背景与挑战
背景概述
WiC(Words in Context)是一个用于评估上下文敏感词义表示的数据集,专注于词义消歧任务。它通过二进制分类,要求判断目标词在两个不同上下文中是否对应相同含义,以解决静态词嵌入无法反映动态语义的问题。该数据集适用于语言模型和词嵌入的评测。
以上内容由遇见数据集搜集并总结生成



