MMEB-eval
收藏魔搭社区2026-05-15 更新2025-02-08 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/MMEB-eval
下载链接
链接失效反馈官方服务:
资源简介:
# Massive Multimodal Embedding Benchmark
We compile a large set of evaluation tasks to understand the capabilities of multimodal embedding models. This benchmark covers 4 meta tasks and 36 datasets meticulously selected for evaluation.
The dataset is published in our paper [VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks](https://arxiv.org/abs/2410.05160).
## Dataset Usage
For each dataset, we have 1000 examples for evaluation. Each example contains a query and a set of targets. Both the query and target could be any combination of image and text. The first one in the candidate list is the groundtruth target.
## Statistics
We show the statistics of all the datasets as follows:
<img width="900" alt="abs" src="statistics.png">
## Per-dataset Results
We list the performance of different embedding models in the following:
<img width="900" alt="abs" src="leaderboard.png">
## Submission
We will set a formal leaderboard soon. If you want to add your results to the leaderboard, please send email to us at ziyanjiang528@gmail.com.
## Cite Us
```
@article{jiang2024vlm2vec,
title={VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks},
author={Jiang, Ziyan and Meng, Rui and Yang, Xinyi and Yavuz, Semih and Zhou, Yingbo and Chen, Wenhu},
journal={arXiv preprint arXiv:2410.05160},
year={2024}
}
```
# 大规模多模态嵌入基准测试集(Massive Multimodal Embedding Benchmark)
我们构建了一套大规模的评估任务体系,以探究多模态嵌入模型(multimodal embedding models)的能力与性能表现。该基准涵盖4类元任务与36个经过精心遴选的评估数据集。
本数据集的相关研究成果已发表于论文《VLM2Vec:面向大规模多模态嵌入任务的视觉语言模型(Vision-Language Models)训练》,链接为https://arxiv.org/abs/2410.05160。
## 数据集使用说明
针对每个评估数据集,我们均提供1000条测试样本。每条样本包含一个查询项与一组目标项,查询项与目标项均可为图像、文本或二者的任意组合。候选列表中的首个条目即为真实目标(groundtruth target)。
## 统计信息
我们将所有数据集的统计信息展示如下:
<img width="900" alt="abs" src="statistics.png">
## 单数据集评测结果
我们在下方列出了不同嵌入模型的评测性能:
<img width="900" alt="abs" src="leaderboard.png">
## 结果提交
我们即将上线正式的排行榜。若您希望将模型评测结果提交至该排行榜,请发送邮件至:ziyanjiang528@gmail.com。
## 引用我们
@article{jiang2024vlm2vec,
title={VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks},
author={Jiang, Ziyan and Meng, Rui and Yang, Xinyi and Yavuz, Semih and Zhou, Yingbo and Chen, Wenhu},
journal={arXiv preprint arXiv:2410.05160},
year={2024}
}
提供机构:
maas
创建时间:
2025-02-03
搜集汇总
数据集介绍

背景与挑战
背景概述
MMEB-eval是一个大规模多模态嵌入基准数据集,包含4个元任务和36个数据集,用于评估多模态嵌入模型的性能。每个数据集提供1000个评估样本,支持图像和文本的任意组合查询和目标。
以上内容由遇见数据集搜集并总结生成



