瑞典STS-B数据集

Name: 瑞典STS-B数据集
Creator: 瑞典创新局
Published: 2020-11-29 18:04:27
License: 暂无描述

arXiv2020-11-29 更新2024-06-21 收录

下载链接：

https://github.com/timpal0l/sts-benchmark-swedish

下载链接

链接失效反馈

官方服务：

资源简介：

瑞典STS-B数据集是首个针对瑞典语的文本语义相似性评估基准，由瑞典创新局支持，通过谷歌机器翻译API将英文STS-B数据集翻译成瑞典语。该数据集包含从5.00（最相似）到0.00（最不相似）的人类相似性评分，主要用于比较现有的瑞典语文本表示模型。数据集的创建过程简单直接，主要依赖机器翻译，但也引入了翻译错误和词汇变化等问题。该数据集的应用领域主要集中在自然语言处理中的语义文本相似性任务，旨在解决瑞典语环境下缺乏高质量评估数据的问题。

The Swedish STS-B dataset is the first Swedish-language benchmark for textual semantic similarity evaluation. It was developed with support from the Swedish Agency for Innovation Systems, and translated from the original English STS-B dataset using the Google Machine Translation API. This dataset includes human-rated similarity scores spanning from 5.00 (indicating maximum similarity) to 0.00 (indicating minimum similarity), and is primarily utilized for comparing existing Swedish text representation models. The dataset construction process is straightforward, relying heavily on machine translation, yet it also introduces issues such as translation errors and lexical variations. Its core application scenarios center on semantic textual similarity tasks within natural language processing, with the objective of addressing the lack of high-quality evaluation data in the Swedish language context.

提供机构：

瑞典创新局

创建时间：

2020-09-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集