ruMTEB
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/embeddings-benchmark/mteb/tree/1.12.75
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为ruMTEB基准,是大规模文本嵌入基准(MTEB)的俄语版本,涵盖了七个类别的任务,包括语义文本相似度、文本分类、重新排序和检索等。该基准评估了一系列俄语及多语言模型的代表性集合,研究结果显示,新型模型ru-en-RoSBERTa在俄语任务中的表现与现有最先进模型持平。该数据集对多种嵌入模型在各项任务上的表现进行了评估,这些任务涵盖了语义文本相似度、文本分类、重新排序和检索等多个领域。
The dataset is named ruMTEB Benchmark, which is the Russian-language variant of the Massive Text Embedding Benchmark (MTEB). It encompasses tasks across seven categories, including semantic textual similarity, text classification, re-ranking, retrieval, and more. This benchmark evaluates a representative collection of both monolingual Russian and multilingual embedding models. Research results show that the newly developed model ru-en-RoSBERTa achieves performance on par with current state-of-the-art models on Russian-language tasks. Additionally, this dataset assesses the performance of various embedding models across a diverse range of tasks, covering semantic textual similarity, text classification, re-ranking, retrieval, and other related fields.



