RUSSE (Russian Words in Context (based on RUSSE))

Name: RUSSE (Russian Words in Context (based on RUSSE))
Creator: OpenDataLab
Published: 2026-05-31 08:30:18
License: 暂无描述

OpenDataLab2026-05-31 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/RUSSE

下载链接

链接失效反馈

官方服务：

资源简介：

WiC：上下文中的词数据集评估上下文相关词嵌入的可靠基准。根据其上下文，一个模棱两可的词可以指代多个可能不相关的含义。主流的静态词嵌入，如 Word2vec 和 GloVe，无法反映这种动态语义性质。上下文化词嵌入是通过计算可以根据上下文适应的词的动态表示来解决这一限制的尝试。俄语 SuperGLUE 任务借用 Russe 项目的原始数据，Word Sense Induction and Disambiguation 共享任务（2018）任务类型阅读理解。二进制分类：真/假例子 { “idx”：8， “单词”：“дорожка”， "sentence1" : "Бурые ковровые дорожки заглушали шаги", "sentence2" : "Приятели решили выпить на дорожку в местном баре", “开始1”：15， “结束1”：23， “开始2”：26， “end2”：34， “标签”：假， “gold_sense1”：1， “gold_sense2”：2 } 我们是如何收集数据的？所有文本示例均来自 Russe 原始数据集，该数据集已由 ACL SIGSLAV 的俄罗斯语义评估收集。在 Yandex.Toloka 上进行了人工评估。在版本 2 中，我们手动收集了相同格式的测试集。

WiC: Word-in-Context Dataset, a reliable benchmark for evaluating context-aware word embeddings. An ambiguous word can refer to multiple potentially unrelated meanings based on its context. Mainstream static word embeddings such as Word2vec and GloVe fail to capture this dynamic semantic property. Contextualized word embeddings aim to address this limitation by generating dynamic word representations that adapt to their respective contextual cues. The Russian SuperGLUE task borrows raw data from the Russe project, specifically the 2018 Word Sense Induction and Disambiguation Shared Task. Task Type: Reading Comprehension. Binary Classification: True/False. Example: { "idx": 8, "word": "дорожка", "sentence1": "Бурые ковровые дорожки заглушали шаги", "sentence2": "Приятели решили выпить на дорожку в местном баре", "start1": 15, "end1": 23, "start2": 26, "end2": 34, "label": false, "gold_sense1": 1, "gold_sense2": 2 } How did we collect the data? All text examples are sourced from the original Russe dataset, which was compiled for Russian semantic evaluation campaigns organized by ACL SIGSLAV. Human annotation was conducted via Yandex.Toloka. In Version 2, we manually curated a test set in the same format.

提供机构：

OpenDataLab

创建时间：

2022-06-28

搜集汇总

数据集介绍