Slovenian Word in Context dataset SloWiC 1.0

SSH Open MarketPlace2025-07-04 更新2025-07-05 收录

下载链接：

https://marketplace.sshopencloud.eu/dataset/ven3dH

下载链接

链接失效反馈

官方服务：

资源简介：

The SloWIC dataset is a Slovenian dataset for the Word in Context task. Each example in the dataset contains a target word with multiple meanings and two sentences that both contain the target word. Each example is also annotated with a label that shows if both sentences use the same meaning of the target word. The dataset contains 1808 manually annotated sentence pairs and additional 13150 automatically annotated pairs to help with training larger models. The dataset is stored in the JSON format following the format used in the [SuperGLUE version](https://github.com/clarinsi/classla) of the Word in Context task. Each example contains the following data fields: * word: The target word with multiple meanings * sentence1: The first sentence containing the target word * sentence2: The second sentence containing the target word * idx: The index of the example in the dataset * label: Label showing if the sentences contain the same meaning of the target word * start1: Start of the target word in the first sentence * start2: Start of the target word in the second sentence * end1: End of the target word in the first sentence * end2: End of the target word in the second sentence * version: The version of the annotation * manual_annotation: Boolean showing if the label was manually annotated * group: The group of annotators that labelled the example

创建时间：

2025-07-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集