MMTEB

Name: MMTEB
Creator: Community-driven, specific contributors not listed.
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/embeddings-benchmark/results

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是对MTEB的大规模社区驱动扩展，涵盖了超过500个质量控制评估任务，涉及250多种语言。它不仅包含了一系列具有挑战性的任务，如指令遵循、长文档检索和代码检索，还提供了性能指标、运行时间和二氧化碳排放量的数据。该数据集的任务是评估多语言文本嵌入在多种任务和语言中的表现。

This dataset is a large-scale community-driven extension of MTEB, covering over 500 quality-controlled evaluation tasks spanning more than 250 languages. It includes a series of challenging tasks such as instruction following, long-document retrieval and code retrieval, and provides data on performance metrics, runtime and carbon dioxide emissions. The core objective of this dataset is to evaluate the performance of multilingual text embeddings across diverse tasks and languages.

提供机构：

Community-driven, specific contributors not listed.

5,000+

优质数据集

54 个

任务类型

进入经典数据集