MMEB-train

Name: MMEB-train
Creator: maas
Published: 2026-05-08 17:25:39
License: 暂无描述

魔搭社区2026-05-08 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/MMEB-train

下载链接

链接失效反馈

官方服务：

资源简介：

# Massive Multimodal Embedding Benchmark The training data split used for training VLM2Vec models in the paper [VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks](https://arxiv.org/abs/2410.05160) (ICLR 2025). MMEB benchmark covers 4 meta tasks and 36 datasets meticulously selected for evaluating capabilities of multimodal embedding models. During training, we utilize 20 out of the 36 datasets. For evaluation, we assess performance on the 20 in-domain (IND) datasets and the remaining 16 out-of-domain (OOD) datasets. Please refer to [TIGER-Lab/MMEB-eval](https://huggingface.co/datasets/TIGER-Lab/MMEB-eval) for the test split of MMEB. # News [2025-01]: We have updated our training data. Each subset now contains two splits: ```original``` and ```diverse_instruction```. The ```original``` split is provided to support the reproduction of our paper results. The ```diverse_instruction``` split includes paraphrased instructions for each task, designed to enhance instruction diversity and improve the model's robustness to unseen instructions and tasks. Moving forward, our future releases will primarily use the ```diverse_instruction``` split. ## Dataset Usage For each dataset, we have 1000 examples for evaluation. Each example contains a query and a set of targets. Both the query and target could be any combination of image and text. The first one in the candidate list is the groundtruth target. ## Statistics We show the statistics of all the datasets as follows: <img width="900" alt="abs" src="https://huggingface.co/datasets/TIGER-Lab/MMEB-eval/resolve/main/statistics.png?download=true"> ## Cite Us ``` @article{jiang2024vlm2vec, title={VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks}, author={Jiang, Ziyan and Meng, Rui and Yang, Xinyi and Yavuz, Semih and Zhou, Yingbo and Chen, Wenhu}, journal={arXiv preprint arXiv:2410.05160}, year={2024} } ```

# 大规模多模态嵌入基准测试集（Massive Multimodal Embedding Benchmark，MMEB）本数据集为论文《VLM2Vec：面向大规模多模态嵌入任务的视觉语言模型训练》[VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks](https://arxiv.org/abs/2410.05160)（ICLR 2025）中用于训练VLM2Vec模型的训练数据划分。 MMEB基准测试集涵盖4个元任务与36个经精心遴选的数据集，用于评估多模态嵌入模型的各项能力。训练阶段，我们采用36个数据集中的20个开展模型训练。评估阶段，我们分别在20个域内（In-Domain, IND）数据集与剩余16个域外（Out-of-Domain, OOD）数据集上测试模型性能。MMEB的测试划分可参考数据集[TIGER-Lab/MMEB-eval](https://huggingface.co/datasets/TIGER-Lab/MMEB-eval)。 # 最新动态 [2025-01]：我们已更新训练数据划分。每个子集现已包含两个划分：`original`与`diverse_instruction`。其中`original`划分用于复现论文中的实验结果；`diverse_instruction`划分包含各任务的释义化指令，旨在提升指令多样性，增强模型对未见指令与任务的鲁棒性。后续我们的版本将主要采用`diverse_instruction`划分。 ## 数据集使用说明针对每个数据集，我们预留1000个样本用于评估。每个样本包含一条查询与一组目标样本。查询与目标均可为图像与文本的任意组合。候选列表中的首个目标为真实标注目标。 ## 数据集统计信息我们将所有数据集的统计信息展示如下： <img width="900" alt="abs" src="https://huggingface.co/datasets/TIGER-Lab/MMEB-eval/resolve/main/statistics.png?download=true"> ## 引用我们 @article{jiang2024vlm2vec, title={VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks}, author={Jiang, Ziyan and Meng, Rui and Yang, Xinyi and Yavuz, Semih and Zhou, Yingbo and Chen, Wenhu}, journal={arXiv preprint arXiv:2410.05160}, year={2024} }

提供机构：

maas

创建时间：

2025-02-03

搜集汇总

数据集介绍