mrp/Thai-Semantic-Textual-Similarity-Benchmark

Hugging Face2021-11-29 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/mrp/Thai-Semantic-Textual-Similarity-Benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

泰语句子向量基准测试是一个用于评估泰语句子表示性能的数据集。该数据集通过翻译STS-B（语义文本相似性基准测试）生成，并使用Spearman相关系数来评估句子表示的质量。该基准测试旨在解决泰语NLI或STS数据集缺乏的问题，为泰语句子表示训练提供支持。

The Thai Sentence Vector Benchmark is a dataset developed to evaluate the performance of Thai sentence representation models. This dataset is generated by translating STS-B (Semantic Textual Similarity Benchmark), and employs Spearman's rank correlation coefficient to assess the quality of sentence representations. This benchmark aims to address the shortage of Thai Natural Language Inference (NLI) or STS datasets, and provides support for the training of Thai sentence representation models.

提供机构：

mrp

原始信息汇总

Thai Semantic Textual Similarity Benchmark

数据集描述

来源: 本数据集是STS-B的泰语翻译版本，使用Google翻译工具进行翻译。
文件: sts-test_th.csv

评估方法

本地评估: 使用SentEval.ipynb进行句子表示的评估。
Google Colab评估: 可通过链接https://colab.research.google.com/github/mrpeerat/Thai-Sentence-Vector-Benchmark/blob/main/SentEval.ipynb在Google Colab上进行评估。

模型性能

模型名称	Spearmans Correlation (*100)	是否监督学习
simcse-model-distil-m-bert	38.84
simcse-model-m-bert-thai-cased	39.26
simcse-model-roberta-base-thai	62.60
distiluse-base-multilingual-cased-v2	63.50	✓
paraphrase-multilingual-mpnet-base-v2	80.11	✓

5,000+

优质数据集

54 个

任务类型

进入经典数据集