ruMTEB

arXiv2025-09-30 收录

下载链接：

https://github.com/embeddings-benchmark/mteb/tree/1.12.75

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为ruMTEB基准，是大规模文本嵌入基准（MTEB）的俄语版本，涵盖了七个类别的任务，包括语义文本相似度、文本分类、重新排序和检索等。该基准评估了一系列俄语及多语言模型的代表性集合，研究结果显示，新型模型ru-en-RoSBERTa在俄语任务中的表现与现有最先进模型持平。该数据集对多种嵌入模型在各项任务上的表现进行了评估，这些任务涵盖了语义文本相似度、文本分类、重新排序和检索等多个领域。

The dataset is named ruMTEB Benchmark, which is the Russian-language variant of the Massive Text Embedding Benchmark (MTEB). It encompasses tasks across seven categories, including semantic textual similarity, text classification, re-ranking, retrieval, and more. This benchmark evaluates a representative collection of both monolingual Russian and multilingual embedding models. Research results show that the newly developed model ru-en-RoSBERTa achieves performance on par with current state-of-the-art models on Russian-language tasks. Additionally, this dataset assesses the performance of various embedding models across a diverse range of tasks, covering semantic textual similarity, text classification, re-ranking, retrieval, and other related fields.

5,000+

优质数据集

54 个

任务类型

进入经典数据集