five

Slovenian Word in Context dataset SloWiC 1.0

收藏
SSH Open MarketPlace2025-07-04 更新2025-07-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/ven3dH
下载链接
链接失效反馈
官方服务:
资源简介:
The SloWIC dataset is a Slovenian dataset for the Word in Context task. Each example in the dataset contains a target word with multiple meanings and two sentences that both contain the target word. Each example is also annotated with a label that shows if both sentences use the same meaning of the target word. The dataset contains 1808 manually annotated sentence pairs and additional 13150 automatically annotated pairs to help with training larger models. The dataset is stored in the JSON format following the format used in the [SuperGLUE version](https://github.com/clarinsi/classla) of the Word in Context task. Each example contains the following data fields: * word: The target word with multiple meanings * sentence1: The first sentence containing the target word * sentence2: The second sentence containing the target word * idx: The index of the example in the dataset * label: Label showing if the sentences contain the same meaning of the target word * start1: Start of the target word in the first sentence * start2: Start of the target word in the second sentence * end1: End of the target word in the first sentence * end2: End of the target word in the second sentence * version: The version of the annotation * manual_annotation: Boolean showing if the label was manually annotated * group: The group of annotators that labelled the example
创建时间:
2025-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作